Wednesday, June 18, 2014

Error & Solution : Automatic Failover configuration (HDFS High Availability for Hadoop 2.X)

Error & Solution : Automatic Failover configuration (HDFS High Availability for Hadoop 2.X)
This is the continue post on Error & Solution during setup Hadoop HA

Here I have discussed few error / issues during Automatic Failover configuration
a part of the Hadoop HA setup.

Error 1)
If you are converting a non-HA NameNode to be HA, you should run the command "hdfs namenode -initializeSharedEdits", which will initialize the JournalNodes with the edits data from the local NameNode edits directories
root@solaiv[bin]#./hdfs namenode -initializeSharedEdits

ERROR namenode.NameNode: Could not initialize shared edits dir java.io.IOException: Cannot start an HA namenode with name dirs that need recovery. Dir: Storage Directory /app/hadoop2/namenode state: NON_EXISTENT
Solution
create namenode dir in
root@boss[bin]#mkdir -P /app/hadoop2/namenode


Error 2)

root@solaiv[bin]#./hdfs namenode -initializeSharedEdits

namenode.NameNode: Could not initialize shared edits dir The directory is already locked;
Solution
make sure full permission to hadoop.dir for namenode, datanode and journalnode
root@boss[bin]#chmod 777 -R /app/hadoop2/
I have configured all the dirs under /app/hadoop2

root@boss[bin]#ls -l /app/hadoop2/

drwxrwxrwx 2 root root 4096 Nov 29 12:27 datanode
drwxrwxrwx 3 root root 4096 Nov 28 19:38 jn
drwxrwxrwx 3 root root 4096 Nov 29 12:32 namenode


Error 3)
This time when i run the initializeSharedEdits on standby node,
root@standby[bin]#hdfs namenode -initializeSharedEdits

14/06/03 14:42:28 ERROR namenode.NameNode: Could not initialize shared edits dir java.io.FileNotFoundException: No valid image files found at org.apache.hadoop.hdfs.server.namenode.
FSImageTransactionalStorageInspector.
getLatestImages(FSImageTransactionalStorageInspector.java:144)
Solution
Error due to standby node couldn't sync with active namenode
format the satndby namenode
standby@hadoop[bin]#hdfs namenode -format


Error 4)
in order to Initialize standby node. Format standby node namenode and copy the latest checkpoint (FSImage) from master to standby by executing the following command:
root@standby[bin]#hdfs namenode -bootstrapStandby
This command connects with master node to get the namespace metadata and the checkpointed fsimage. This command also ensures that standby node receives sufficient editlogs from the JournalNodes (corresponding to the fsimage). This command fails if JournalNodes are not correctly initialized and cannot provide the required editlogs.
root@standby[bin]#hdfs namenode -bootstrapStandby

org.apache.hadoop.hdfs.qjournal.protocol.
JournalNotFormattedException: Journal Storage Directory /app/hadoop2/jn/mycluster not formatted

10.184.39.147:8485: Journal Storage Directory /app/hadoop2/jn/mycluster not formatted at org.apache.hadoop.hdfs.qjournal.server.Journal.
checkFormatted(Journal.java:453) at org.apache.hadoop.hdfs.qjournal.server.Journal.
getEditLogManifest(Journal.java:636) at org.apache.hadoop.hdfs.qjournal.server.JournalNodeRpcServer.
getEditLogManifest(JournalNodeRpcServer.java:181) …... FATAL ha.BootstrapStandby: Unable to read transaction ids 3-13784 from the configured shared edits storage qjournal://master:8485;standby:8485/mycluster. Please copy these logs into the shared edits storage or call saveNamespace on the active node. Error: Gap in transactions. Expected to be able to read up until at least txid 13784 but unable to find any edit logs containing txid 3
Solution
I finally solved this by copying data for a 'good' journal node (aka, from 'master') to the unformatted one (aka, standby where i was getting error)
root@master[bin]#scp -r /app/hadoop2/jn/mycluster/ root@standby:/app/hadoop2/jn/
then restarted the journanl node.

root@standby[bin]#../sbin/hadoop-daemon.sh start journalnode

root@standby[bin]#hdfs namenode -bootstrapStandby


Related posts

Error and Solution - Hadoop HA
distributed Hadoop setup
Issue while setup Hadoop cluster
Post a Comment