Tuesday, August 27, 2013

Issues with MongoDB and Hadoop Integration


I just tried create connector for  mongoDB NoSQL with Hadoop 0.21.0

while i'm doing so got few error which is solvable and hadoop-core unresolved dependency which is unsolvable. Then I decided to create connector for Hadoop 1.1.0.

After downloaded from git, did

ERROR 1)


root@boss[mongo-hadoop]# ./sbt package


java.lang.RuntimeException: Hadoop Release '%s' is an invalid/unsupported release.  Valid entries are in 0.21.0
 at scala.sys.package$.error(package.scala:27)
 at MongoHadoopBuild$$anonfun$streamingSettings$6$$anonfun$apply$8.apply(MongoHadoopBuild.scala:176)
 at MongoHadoopBuild$$anonfun$streamingSettings$6$$anonfun$apply$8.apply(MongoHadoopBuild.scala:176)
 at scala.collection.MapLike$class.getOrElse(MapLike.scala:122)
 at scala.collection.immutable.HashMap.getOrElse(HashMap.scala:38)
 at MongoHadoopBuild$$anonfun$streamingSettings$6.apply(MongoHadoopBuild.scala:176)
 at MongoHadoopBuild$$anonfun$streamingSettings$6.apply(MongoHadoopBuild.scala:175)
 at scala.Function1$$anonfun$compose$1.apply(Function1.scala:49)
 at scala.Function1$$anonfun$compose$1.apply(Function1.scala:49)
 at sbt.EvaluateSettings$$anonfun$sbt$EvaluateSettings$$single$1.apply(INode.scala:159)
 at sbt.EvaluateSettings$$anonfun$sbt$EvaluateSettings$$single$1.apply(INode.scala:159)
 at sbt.EvaluateSettings$MixedNode.evaluate0(INode.scala:177)
 at sbt.EvaluateSettings$INode.evaluate(INode.scala:132)
 at sbt.EvaluateSettings$$anonfun$sbt$EvaluateSettings$$submitEvaluate$1.apply$mcV$sp(INode.scala:64)
 at sbt.EvaluateSettings.sbt$EvaluateSettings$$run0(INode.scala:73)
 at sbt.EvaluateSettings$$anon$3.run(INode.scala:69)
 at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
 at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
 at java.lang.Thread.run(Thread.java:619)
[error] Hadoop Release '%s' is an invalid/unsupported release.  Valid entries are in 0.21.0
[error] Use 'last' for the full log.

SOLUTION
I simply changed in build.sbt
"0.21.0"  to "0.21"



root@boss[mongo-hadoop]#vi build.sbt 
.......
.......
hadoopRelease in ThisBuild := "0.21"


ERROR 2)
then I executed the same

root@boss[mongo-hadoop]# ./sbt package


module not found: org.apache.hadoop#hadoop-core;0.21.0
[warn] ==== local: tried
[warn]   /root/.ivy2/local/org.apache.hadoop/hadoop-core/0.21.0/ivys/ivy.xml
[warn] ==== Simile Repo at MIT: tried
[warn]   http://simile.mit.edu/maven/org/apache/hadoop/hadoop-core/0.21.0/hadoop-core-0.21.0.pom
[warn] ==== Cloudera Repository: tried
[warn]   https://repository.cloudera.com/artifactory/cloudera-repos/org/apache/hadoop/hadoop-core/0.21.0/hadoop-core-0.21.0.pom
[warn] ==== Maven.Org Repository: tried
[warn]   http://repo1.maven.org/maven2/org/apache/hadoop/hadoop-core/0.21.0/hadoop-core-0.21.0.pom
[warn] ==== releases: tried
[warn]   https://oss.sonatype.org/content/repositories/releases/org/apache/hadoop/hadoop-core/0.21.0/hadoop-core-0.21.0.pom
[warn] ==== public: tried
[warn]   http://repo1.maven.org/maven2/org/apache/hadoop/hadoop-core/0.21.0/hadoop-core-0.21.0.pom
[info] Resolving org.specs2#specs2-scalaz-core_2.9.2;6.0.1 ...
[warn]  ::::::::::::::::::::::::::::::::::::::::::::::
[warn]  ::          UNRESOLVED DEPENDENCIES         ::
[warn]  ::::::::::::::::::::::::::::::::::::::::::::::
[warn]  :: org.apache.hadoop#hadoop-core;0.21.0: not found
[warn]  ::::::::::::::::::::::::::::::::::::::::::::::
sbt.ResolveException: unresolved dependency: org.apache.hadoop#hadoop-core;0.21.0: not found

SOLUTION

 surfing around web, concluded  issues related to port 443. even after 443 port is opened error remains.

then i gone through all the link, I never find any hadoop-core-0.21.0.jar file.

---

then i tried to create connector for Hadoop version 1.1.0, it was successfully created. you can also download mongoDB connector for Hadoop 1.1 here.

I'm working with MongoDB + Hadoop  with Pig. will update any issues or example work flow.
 

---

working with MongoDB + Hadoop with Pig, followed  Treasury Yield Calculation example from here.

1) mongoimport of .json data
2) downloaded piggybank-0.3-amzn jar from s3://elasticmapreduce/libs/pig/0.3/piggybank-0.3-amzn.jar

while I'm executing

grunt> REGISTER /opt/bigdata/mongo-hadoop/piggybank-0.3-amzn.jar  
...... 
   grunt> date_tenyear = foreach  raw generate UnixToISO($0#'_id'), $0#'bc10Year';

have got below error


ERROR 1070: Could not resolve org.apache.pig.piggybank.evaluation.datetime.convert.UnixToISO using imports: [, org.apache.pig.builtin., org.apache.pig.impl.builtin.]

then I went through piggybank-0.3-amzn.jar, I've never seen any method like UnixToISO().

this time I registered pigybank.jar from $PIG_HOME/ contrib/piggybank/java


grunt> REGISTER /opt/bigdata/pig-0.10.0/contrib/piggybank/java/piggybank.jar;
....

after this all the steps from earlier said example worked,  can able to read/write  MongoDB through Pig.






3 comments:

sundara rami reddy said...

This is a great inspiring tutorials on hadoop.I am pretty much pleased with your good work.You put really very helpful information. Keep it up.
Hadoop Training in hyderabad

anjali gautam said...

Which particular version of connector did you download for Hadoop 1.1?

solaimurugan v said...

i was used,
mongo-hadoop-core_1.1.2-1.1.0.jar