This is the continue post on Error & Solution during setup Hadoop HA
Here I have discussed few error / issues during Automatic Failover configuration a part of the Hadoop HA setup.
root@standby[bin]#../sbin/hadoop-daemon.sh start journalnode
root@standby[bin]#hdfs namenode -bootstrapStandby
distributed Hadoop setup Issue while setup Hadoop cluster
Here I have discussed few error / issues during Automatic Failover configuration a part of the Hadoop HA setup.
Error 1)
If you are converting a non-HA NameNode to be HA, you should run the command "hdfs namenode -initializeSharedEdits", which will initialize the JournalNodes with the edits data from the local NameNode edits directories
If you are converting a non-HA NameNode to be HA, you should run the command "hdfs namenode -initializeSharedEdits", which will initialize the JournalNodes with the edits data from the local NameNode edits directories
root@solaiv[bin]#./hdfs namenode -initializeSharedEdits
ERROR namenode.NameNode: Could not initialize shared edits dir java.io.IOException: Cannot start an HA namenode with name dirs that need recovery. Dir: Storage Directory /app/hadoop2/namenode state: NON_EXISTENT
ERROR namenode.NameNode: Could not initialize shared edits dir java.io.IOException: Cannot start an HA namenode with name dirs that need recovery. Dir: Storage Directory /app/hadoop2/namenode state: NON_EXISTENT
Solution
create namenode dir in
create namenode dir in
root@boss[bin]#mkdir -P /app/hadoop2/namenode
Error 2)
root@solaiv[bin]#./hdfs namenode -initializeSharedEdits
namenode.NameNode: Could not initialize shared edits dir The directory is already locked;
namenode.NameNode: Could not initialize shared edits dir The directory is already locked;
Solution
make sure full permission to hadoop.dir for namenode, datanode and journalnode
make sure full permission to hadoop.dir for namenode, datanode and journalnode
root@boss[bin]#chmod 777 -R /app/hadoop2/
I have configured all the dirs under /app/hadoop2
root@boss[bin]#ls -l /app/hadoop2/
drwxrwxrwx 2 root root 4096 Nov 29 12:27 datanode
drwxrwxrwx 3 root root 4096 Nov 28 19:38 jn
drwxrwxrwx 3 root root 4096 Nov 29 12:32 namenode
I have configured all the dirs under /app/hadoop2
root@boss[bin]#ls -l /app/hadoop2/
drwxrwxrwx 2 root root 4096 Nov 29 12:27 datanode
drwxrwxrwx 3 root root 4096 Nov 28 19:38 jn
drwxrwxrwx 3 root root 4096 Nov 29 12:32 namenode
Error 3)
This time when i run the initializeSharedEdits on standby node,
This time when i run the initializeSharedEdits on standby node,
root@standby[bin]#hdfs namenode -initializeSharedEdits
14/06/03 14:42:28 ERROR namenode.NameNode: Could not initialize shared edits dir java.io.FileNotFoundException: No valid image files found at org.apache.hadoop.hdfs.server.namenode.
FSImageTransactionalStorageInspector.
getLatestImages(FSImageTransactionalStorageInspector.java:144)
14/06/03 14:42:28 ERROR namenode.NameNode: Could not initialize shared edits dir java.io.FileNotFoundException: No valid image files found at org.apache.hadoop.hdfs.server.namenode.
FSImageTransactionalStorageInspector.
getLatestImages(FSImageTransactionalStorageInspector.java:144)
Solution
Error due to standby node couldn't sync with active namenode
format the satndby namenode
Error due to standby node couldn't sync with active namenode
format the satndby namenode
standby@hadoop[bin]#hdfs namenode -format
Error 4)
in order to Initialize standby node. Format standby node namenode and copy the latest checkpoint (FSImage) from master to standby by executing the following command:
in order to Initialize standby node. Format standby node namenode and copy the latest checkpoint (FSImage) from master to standby by executing the following command:
root@standby[bin]#hdfs namenode -bootstrapStandby
This command connects with master node to get the namespace metadata and the checkpointed fsimage. This command also ensures that standby node receives sufficient editlogs from the JournalNodes (corresponding to the fsimage). This command fails if JournalNodes are not correctly initialized and cannot provide the required editlogs.
root@standby[bin]#hdfs namenode -bootstrapStandby
org.apache.hadoop.hdfs.qjournal.protocol.
JournalNotFormattedException: Journal Storage Directory /app/hadoop2/jn/mycluster not formatted
10.184.39.147:8485: Journal Storage Directory /app/hadoop2/jn/mycluster not formatted at org.apache.hadoop.hdfs.qjournal.server.Journal.
checkFormatted(Journal.java:453) at org.apache.hadoop.hdfs.qjournal.server.Journal.
getEditLogManifest(Journal.java:636) at org.apache.hadoop.hdfs.qjournal.server.JournalNodeRpcServer.
getEditLogManifest(JournalNodeRpcServer.java:181) …... FATAL ha.BootstrapStandby: Unable to read transaction ids 3-13784 from the configured shared edits storage qjournal://master:8485;standby:8485/mycluster. Please copy these logs into the shared edits storage or call saveNamespace on the active node. Error: Gap in transactions. Expected to be able to read up until at least txid 13784 but unable to find any edit logs containing txid 3
org.apache.hadoop.hdfs.qjournal.protocol.
JournalNotFormattedException: Journal Storage Directory /app/hadoop2/jn/mycluster not formatted
10.184.39.147:8485: Journal Storage Directory /app/hadoop2/jn/mycluster not formatted at org.apache.hadoop.hdfs.qjournal.server.Journal.
checkFormatted(Journal.java:453) at org.apache.hadoop.hdfs.qjournal.server.Journal.
getEditLogManifest(Journal.java:636) at org.apache.hadoop.hdfs.qjournal.server.JournalNodeRpcServer.
getEditLogManifest(JournalNodeRpcServer.java:181) …... FATAL ha.BootstrapStandby: Unable to read transaction ids 3-13784 from the configured shared edits storage qjournal://master:8485;standby:8485/mycluster. Please copy these logs into the shared edits storage or call saveNamespace on the active node. Error: Gap in transactions. Expected to be able to read up until at least txid 13784 but unable to find any edit logs containing txid 3
Solution
I finally solved this by copying data for a 'good' journal node (aka, from 'master') to the unformatted one (aka, standby where i was getting error)
I finally solved this by copying data for a 'good' journal node (aka, from 'master') to the unformatted one (aka, standby where i was getting error)
root@master[bin]#scp -r /app/hadoop2/jn/mycluster/ root@standby:/app/hadoop2/jn/
then restarted the journanl node.
root@standby[bin]#../sbin/hadoop-daemon.sh start journalnode
root@standby[bin]#hdfs namenode -bootstrapStandby
Related posts
Error and Solution - Hadoop HAdistributed Hadoop setup Issue while setup Hadoop cluster
2 comments:
Configuring Automatic Failover for HDFS High Availability in Hadoop 2.x involves setting up several components correctly to ensure seamless failover between the active and standby NameNodes. Here are some common errors and their solutions during this configuration process:
Big Data Projects For Final Year Students
Error: ZooKeeper Quorum Not Configured Correctly
Symptoms: Automatic failover does not work as expected, and the Active NameNode fails to transition to Standby.
Solution:
Check ZooKeeper Quorum Configuration:
Ensure that ZooKeeper is correctly configured with an odd number of nodes (typically 3 or 5) for fault tolerance.
Verify that all ZooKeeper servers are up and running without any errors.
Check the zookeeper.connect property in the hdfs-site.xml and ensure it points to all ZooKeeper nodes in the quorum.
Verify ZooKeeper Connection:
Networking Projects For Final Year
Ensure that each Hadoop node can connect to the ZooKeeper ensemble.
Test ZooKeeper connectivity using zkCli.sh or zkServer.sh commands from the ZooKeeper installation directory.
Error: Incorrect Configuration of NameNode High Availability Parameters
Symptoms: The standby NameNode does not become active after a failover event.
python projects for final year students
Post a Comment