This is continue post, find Spark issues part 1 here
I have Hadoop cluster setup, decided to Deploy Apache Spark over Yarn. for test case I have tried different option to summit Saprk job. Here I have discussed few Exception / issues during Spark deployment on Yarn.
:19: error: value saveAsTextFile is not a member of Array[(String, Int)]
arr.saveAsTextFile("hdfs://localhost:9000/sparkhadoop/sp1")
changing Latest hadoop-2.5.1 version to HADOOP_HOME.
Error 1)
Step to reproduce
val file = sc.textFile("hdfs://master:9000/sparkdata/file2.txt")
val counts = file.flatMap(line => line.split(" ")).map(word => (word, 1)).reduceByKey(_ + _)
val arr = counts.collect()
arr.saveAsTextFile("hdfs://master:9000/sparkhadoop/sp1")
val counts = file.flatMap(line => line.split(" ")).map(word => (word, 1)).reduceByKey(_ + _)
val arr = counts.collect()
arr.saveAsTextFile("hdfs://master:9000/sparkhadoop/sp1")
Solution
Error caused on the bolted line above. Its due to storing the array value to the HDFS. In scala for Spark everything should be in RDD (Resilient Distributed datasets). so that scala variable can use Spark realated objects / methos. in this case just convert array into RDD ( replace bolded line by )
Error caused on the bolted line above. Its due to storing the array value to the HDFS. In scala for Spark everything should be in RDD (Resilient Distributed datasets). so that scala variable can use Spark realated objects / methos. in this case just convert array into RDD ( replace bolded line by )
sc.makeRDD(arr).saveAsTextFile("hdfs://master:9000/sparkhadoop/sp1")
Error 2)
when I run the above wordcount example, I got this error too,
when I run the above wordcount example, I got this error too,
WARN TaskSetManager: Lost task 1.1 in stage 5.0 (TID 47, boss): org.apache.hadoop.hdfs.BlockMissingException:
Could not obtain block: BP-1474416393-10.184.36.194-1406741613979:blk_1073742690_1866 file=sparkdata/file2.txt
Solution
I was geting data from Hadoop HDFS filesystems, my Datanode was down. i just started datanode alone by
I was geting data from Hadoop HDFS filesystems, my Datanode was down. i just started datanode alone by
root@boss:/opt/hadoop-2.2.0# ./sbin/hadoop-daemon.sh start datanode
Error 3)
My nodemanager keep on goes off. i tried many time to start up by
My nodemanager keep on goes off. i tried many time to start up by
root@solaiv[hadoop-2.5.1]# ./sbin/yarn-daemon.sh start nodemanager
FATAL org.apache.hadoop.yarn.server.nodemanager.NodeManager: Error starting NodeManager java.lang.NoClassDefFoundError: org/apache/hadoop/http/HttpServer2$Builder
FATAL org.apache.hadoop.yarn.server.nodemanager.NodeManager: Error starting NodeManager java.lang.NoClassDefFoundError: org/apache/hadoop/http/HttpServer2$Builder
Solution
I checked the hadoop classpath
Few Jar file were still refering to old version of Hadoop i.e hadoop-2.2.0. corrected byI checked the hadoop classpath
root@boss:/opt/hadoop-2.5.1# ./bin/hadoop classpath
changing Latest hadoop-2.5.1 version to HADOOP_HOME.
No comments:
Post a Comment