Wednesday, June 18, 2014

Error & Solution : Automatic Failover configuration (HDFS High Availability for Hadoop 2.X)

Error & Solution : Automatic Failover configuration (HDFS High Availability for Hadoop 2.X)
This is the continue post on Error & Solution during setup Hadoop HA

Here I have discussed few error / issues during Automatic Failover configuration
a part of the Hadoop HA setup.

Error 1)
If you are converting a non-HA NameNode to be HA, you should run the command "hdfs namenode -initializeSharedEdits", which will initialize the JournalNodes with the edits data from the local NameNode edits directories
root@solaiv[bin]#./hdfs namenode -initializeSharedEdits

ERROR namenode.NameNode: Could not initialize shared edits dir java.io.IOException: Cannot start an HA namenode with name dirs that need recovery. Dir: Storage Directory /app/hadoop2/namenode state: NON_EXISTENT
Solution
create namenode dir in
root@boss[bin]#mkdir -P /app/hadoop2/namenode


Error 2)

root@solaiv[bin]#./hdfs namenode -initializeSharedEdits

namenode.NameNode: Could not initialize shared edits dir The directory is already locked;
Solution
make sure full permission to hadoop.dir for namenode, datanode and journalnode
root@boss[bin]#chmod 777 -R /app/hadoop2/
I have configured all the dirs under /app/hadoop2

root@boss[bin]#ls -l /app/hadoop2/

drwxrwxrwx 2 root root 4096 Nov 29 12:27 datanode
drwxrwxrwx 3 root root 4096 Nov 28 19:38 jn
drwxrwxrwx 3 root root 4096 Nov 29 12:32 namenode


Error 3)
This time when i run the initializeSharedEdits on standby node,
root@standby[bin]#hdfs namenode -initializeSharedEdits

14/06/03 14:42:28 ERROR namenode.NameNode: Could not initialize shared edits dir java.io.FileNotFoundException: No valid image files found at org.apache.hadoop.hdfs.server.namenode.
FSImageTransactionalStorageInspector.
getLatestImages(FSImageTransactionalStorageInspector.java:144)
Solution
Error due to standby node couldn't sync with active namenode
format the satndby namenode
standby@hadoop[bin]#hdfs namenode -format


Error 4)
in order to Initialize standby node. Format standby node namenode and copy the latest checkpoint (FSImage) from master to standby by executing the following command:
root@standby[bin]#hdfs namenode -bootstrapStandby
This command connects with master node to get the namespace metadata and the checkpointed fsimage. This command also ensures that standby node receives sufficient editlogs from the JournalNodes (corresponding to the fsimage). This command fails if JournalNodes are not correctly initialized and cannot provide the required editlogs.
root@standby[bin]#hdfs namenode -bootstrapStandby

org.apache.hadoop.hdfs.qjournal.protocol.
JournalNotFormattedException: Journal Storage Directory /app/hadoop2/jn/mycluster not formatted

10.184.39.147:8485: Journal Storage Directory /app/hadoop2/jn/mycluster not formatted at org.apache.hadoop.hdfs.qjournal.server.Journal.
checkFormatted(Journal.java:453) at org.apache.hadoop.hdfs.qjournal.server.Journal.
getEditLogManifest(Journal.java:636) at org.apache.hadoop.hdfs.qjournal.server.JournalNodeRpcServer.
getEditLogManifest(JournalNodeRpcServer.java:181) …... FATAL ha.BootstrapStandby: Unable to read transaction ids 3-13784 from the configured shared edits storage qjournal://master:8485;standby:8485/mycluster. Please copy these logs into the shared edits storage or call saveNamespace on the active node. Error: Gap in transactions. Expected to be able to read up until at least txid 13784 but unable to find any edit logs containing txid 3
Solution
I finally solved this by copying data for a 'good' journal node (aka, from 'master') to the unformatted one (aka, standby where i was getting error)
root@master[bin]#scp -r /app/hadoop2/jn/mycluster/ root@standby:/app/hadoop2/jn/
then restarted the journanl node.

root@standby[bin]#../sbin/hadoop-daemon.sh start journalnode

root@standby[bin]#hdfs namenode -bootstrapStandby


Related posts

Error and Solution - Hadoop HA
distributed Hadoop setup
Issue while setup Hadoop cluster

2 comments:

Jack Dowson said...
This comment has been removed by the author.
Vale Co Xenia said...

Configuring Automatic Failover for HDFS High Availability in Hadoop 2.x involves setting up several components correctly to ensure seamless failover between the active and standby NameNodes. Here are some common errors and their solutions during this configuration process:

Big Data Projects For Final Year Students


Error: ZooKeeper Quorum Not Configured Correctly
Symptoms: Automatic failover does not work as expected, and the Active NameNode fails to transition to Standby.

Solution:

Check ZooKeeper Quorum Configuration:

Ensure that ZooKeeper is correctly configured with an odd number of nodes (typically 3 or 5) for fault tolerance.
Verify that all ZooKeeper servers are up and running without any errors.
Check the zookeeper.connect property in the hdfs-site.xml and ensure it points to all ZooKeeper nodes in the quorum.
Verify ZooKeeper Connection:

Networking Projects For Final Year



Ensure that each Hadoop node can connect to the ZooKeeper ensemble.
Test ZooKeeper connectivity using zkCli.sh or zkServer.sh commands from the ZooKeeper installation directory.
Error: Incorrect Configuration of NameNode High Availability Parameters
Symptoms: The standby NameNode does not become active after a failover event.

python projects for final year students