Friday, October 18, 2013

Hadoop multi-node cluster configuration : Error & Solution



Here I've listed few error with solution while configuring multi-node Hadoop cluster.

Environment
OS : Debian 7 (wheezy)  / BOSS 5.0 / Ubundu 12.0
Hadoop Version : 1.1.0
ERROR 1)
hduser is not in the sudoers file. This incident will be reported.
hduser@solaiv[softwares]$sudo vi /etc/profile 
[sudo] password for hduser: 
hduser is not in the sudoers file.  This incident will be reported.  
SOLUTION 1)
in Root user terminal
root@solaiv[~]#echo 'hduser ALL=(ALL) ALL' >> /etc/sudoers

ERROR 2) 
jps command not found

hduser@solaiv[softwares]$sudo jps
jps command not found
SOLUTION 2)
jps is availablle unde JAVA_HOME/bin directories. 
hduser@solaiv:/opt/softwares/hadoop-1.1.0/bin$ alias jps='/opt/softwares/jdk1.6.0_18/bin/jps'

ERROR 3)
Once I've done configuring Hadoop, checked the running process status by jps command. Only NameNode, Secondarynamenode and Datanode were running not Jobtracker and Tasktracker. Then I opened the Jobtracker log file, the error was 

DJOB TRACKER : Can not start task tracker because java.lang.IllegalArgumentException: Does not contain a valid host:port authority: 127.0.0.1:9001

ERROR 4)
Tasktracker log file, the error was

TASK TRACKER : FATAL org.apache.hadoop.mapred.JobTracker: java.lang.IllegalArgumentException: Does not contain a valid host:port authority: 127.0.0.1:9001


SOLUTION 3 & 4)
I was confident that I've done mistake in mapred-site.xml, after going through line by line I figured it out silly mistake. Space between value and xml closing tag
mapred-site.xml file
<property> <name>mapred.job.tracker</name><value>127.0.0.1:9001 </value> </property>
 
# space between 9001 and closing value tag
Just I modified the mapred-site.xml file by,t;

<property><name>mapred.job.tracker</name><value>127.0.0.1:9001</value> </property>

ERROR 5)

Cannot lock storage /app/hadoop/tmp/dfs/name. The directory is already locked.
SOLUTION 5)
A)
Check your dfs.name.dir and dfs.data.dir path in hdfs-site.xml
or B)
stopp all the running Hadoop process and start again
hduser@solaiv:/opt/softwares/hadoop-1.1.0/bin$ ./stop-all.sh
..
hduser@solaiv:/opt/softwares/hadoop-1.1.0/bin$ ./start-all.sh
or C)
Error 5) exist even after above step, then stop all the process, format the NameNode and start again.

hduser@solaiv:/opt/softwares/hadoop-1.1.0/bin$ ./stop-all.sh
..
hduser@solaiv:/opt/softwares/hadoop-1.1.0/bin$ hadoop namenode -format
...
hduser@solaiv:/opt/softwares/hadoop-1.1.0/bin$ ./start-all.sh

ERROR 6) 

ERROR org.apache.hadoop.hdfs.server.namenode.FSNamesystem: FSNamesystem initialization failed. java.io.FileNotFoundException: /app/hadoop/tmp/dfs/name/in_use.lock (Permission denied) NameNode not started
SOLUTION 6)
change the owner to hadoop and ALL permission for dfs.data.dir i.e /app/hadoop/tmp

hduser@solaiv:/opt/softwares/hadoop-1.1.0/bin$chown hduser:hadoop -R /app/hadoop/tmp
hduser@solaiv:/opt/softwares/hadoop-1.1.0/bin$ chmod 777 -R /app/hadoop/tmp

ERROR 7)
jps shows only Namenode and secondary Namenode. Then I started datanode by 

hduser@solaiv[bin]$./hadoop datanode ERROR datanode.DataNode: All directories in dfs.data.dir are invalid.
SOLUTION 7)
first delete all contents from temporary folder:


rm -rf /app/hadoop/tmp/
Make sure that dir has right owner and permission /app/hadoop/tmp/
hduser@solaiv[bin]$sudo chown hduser:hadoop -R /app/hadoop/tmp/
hduser@solaiv[bin]$sudo chmod 777 -R /app/hadoop/tmp/
format the namenode:

hduser@solaiv[bin]$./hadoop namenode -format
Start all processes again: 

hduser@solaiv[bin]$./start-all.sh

ERROR 8) 

hduser@solaiv[bin]$./hadoop fs -mkdir solaiv mkdir: org.apache.hadoop.hdfs.server.namenode.SafeModeException: Cannot create directory /user/hduser/solaiv. Name node is in safe mode.
SOLUTION 8)
hduser@solaiv[bin]$./hadoop dfsadmin -safemode leave

ERROR 9)

...............
...............
13/10/08 17:15:24 ERROR datanode.DataNode: java.io.IOException: Incompatible namespaceIDs in /app/hadoop/tmp/dfs/data: namenode namespaceID = 1580775695; datanode namespaceID = 1494801914
at org.apache.hadoop.hdfs.server.datanode.DataStorage.doTransition(DataStorage.java:232) at org.apache.hadoop.hdfs.server.datanode.DataStorage.recoverTransitionRead(DataStorage.java:147) at org.apache.hadoop.hdfs.server.datanode.DataNode.startDataNode(DataNode.java:397) at org.apache.hadoop.hdfs.server.datanode.DataNode.(DataNode.java:307) at org.apache.hadoop.hdfs.server.datanode.DataNode.makeInstance(DataNode.java:1644) at org.apache.hadoop.hdfs.server.datanode.DataNode.instantiateDataNode(DataNode.java:1583) at org.apache.hadoop.hdfs.server.datanode.DataNode.createDataNode(DataNode.java:1601) at org.apache.hadoop.hdfs.server.datanode.DataNode.secureMain(DataNode.java:1727) at org.apache.hadoop.hdfs.server.datanode.DataNode.main(DataNode.java:1744)13/10/08 17:15:24 INFO datanode.DataNode: SHUTDOWN_MSG:
/************************************************************
SHUTDOWN_MSG:
Shutting down DataNode at boss/127.0.0.1
************************************************************/
SOLUTION 9) 
Namenode generates new namespaceID every time you format HDFS.

root@boss[bin]#vi /app/hadoop/tmp/dfs/data/current/VERSION
Manually update the namespaceID = 1494801914 to 1580775695; and save the file

#Mon Oct 14 14:52:19 IST 2013
namespaceID=1580775695
storageID=DS-469635027-127.0.0.1-50010-1376974011263
cTime= 1377582559943
storageType=DATA_NODE
layoutVersion=-32

ERROR 10)
I got this Error after I resolved Error 9)

13/10/08 17:24:33 ERROR datanode.DataNode: java.io.IOException: Datanode state: LV = -32 CTime = 1377582559943 is newer than the namespace state: LV = -32 CTime = 0
SOLUTION 10)
open the file and change the value of variable cTime to 0.

root@boss[bin]#vi /app/hadoop/tmp/dfs/data/current/VERSION
#Mon Oct 14 14:52:19 IST 2013
namespaceID=1580775695
storageID=DS-469635027-127.0.0.1-50010-1376974011263
cTime= 0
storageType=DATA_NODE
layoutVersion=-32
 
Post a Comment