Tuesday, August 27, 2013

Issues with MongoDB and Hadoop Integration


I just tried create connector for  mongoDB NoSQL with Hadoop 0.21.0

while i'm doing so got few error which is solvable and hadoop-core unresolved dependency which is unsolvable. Then I decided to create connector for Hadoop 1.1.0.

After downloaded from git, did

ERROR 1)


root@boss[mongo-hadoop]# ./sbt package


java.lang.RuntimeException: Hadoop Release '%s' is an invalid/unsupported release.  Valid entries are in 0.21.0
 at scala.sys.package$.error(package.scala:27)
 at MongoHadoopBuild$$anonfun$streamingSettings$6$$anonfun$apply$8.apply(MongoHadoopBuild.scala:176)
 at MongoHadoopBuild$$anonfun$streamingSettings$6$$anonfun$apply$8.apply(MongoHadoopBuild.scala:176)
 at scala.collection.MapLike$class.getOrElse(MapLike.scala:122)
 at scala.collection.immutable.HashMap.getOrElse(HashMap.scala:38)
 at MongoHadoopBuild$$anonfun$streamingSettings$6.apply(MongoHadoopBuild.scala:176)
 at MongoHadoopBuild$$anonfun$streamingSettings$6.apply(MongoHadoopBuild.scala:175)
 at scala.Function1$$anonfun$compose$1.apply(Function1.scala:49)
 at scala.Function1$$anonfun$compose$1.apply(Function1.scala:49)
 at sbt.EvaluateSettings$$anonfun$sbt$EvaluateSettings$$single$1.apply(INode.scala:159)
 at sbt.EvaluateSettings$$anonfun$sbt$EvaluateSettings$$single$1.apply(INode.scala:159)
 at sbt.EvaluateSettings$MixedNode.evaluate0(INode.scala:177)
 at sbt.EvaluateSettings$INode.evaluate(INode.scala:132)
 at sbt.EvaluateSettings$$anonfun$sbt$EvaluateSettings$$submitEvaluate$1.apply$mcV$sp(INode.scala:64)
 at sbt.EvaluateSettings.sbt$EvaluateSettings$$run0(INode.scala:73)
 at sbt.EvaluateSettings$$anon$3.run(INode.scala:69)
 at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
 at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
 at java.lang.Thread.run(Thread.java:619)
[error] Hadoop Release '%s' is an invalid/unsupported release.  Valid entries are in 0.21.0
[error] Use 'last' for the full log.

SOLUTION
I simply changed in build.sbt
"0.21.0"  to "0.21"



root@boss[mongo-hadoop]#vi build.sbt 
.......
.......
hadoopRelease in ThisBuild := "0.21"


ERROR 2)
then I executed the same

root@boss[mongo-hadoop]# ./sbt package


module not found: org.apache.hadoop#hadoop-core;0.21.0
[warn] ==== local: tried
[warn]   /root/.ivy2/local/org.apache.hadoop/hadoop-core/0.21.0/ivys/ivy.xml
[warn] ==== Simile Repo at MIT: tried
[warn]   http://simile.mit.edu/maven/org/apache/hadoop/hadoop-core/0.21.0/hadoop-core-0.21.0.pom
[warn] ==== Cloudera Repository: tried
[warn]   https://repository.cloudera.com/artifactory/cloudera-repos/org/apache/hadoop/hadoop-core/0.21.0/hadoop-core-0.21.0.pom
[warn] ==== Maven.Org Repository: tried
[warn]   http://repo1.maven.org/maven2/org/apache/hadoop/hadoop-core/0.21.0/hadoop-core-0.21.0.pom
[warn] ==== releases: tried
[warn]   https://oss.sonatype.org/content/repositories/releases/org/apache/hadoop/hadoop-core/0.21.0/hadoop-core-0.21.0.pom
[warn] ==== public: tried
[warn]   http://repo1.maven.org/maven2/org/apache/hadoop/hadoop-core/0.21.0/hadoop-core-0.21.0.pom
[info] Resolving org.specs2#specs2-scalaz-core_2.9.2;6.0.1 ...
[warn]  ::::::::::::::::::::::::::::::::::::::::::::::
[warn]  ::          UNRESOLVED DEPENDENCIES         ::
[warn]  ::::::::::::::::::::::::::::::::::::::::::::::
[warn]  :: org.apache.hadoop#hadoop-core;0.21.0: not found
[warn]  ::::::::::::::::::::::::::::::::::::::::::::::
sbt.ResolveException: unresolved dependency: org.apache.hadoop#hadoop-core;0.21.0: not found

SOLUTION

 surfing around web, concluded  issues related to port 443. even after 443 port is opened error remains.

then i gone through all the link, I never find any hadoop-core-0.21.0.jar file.

---

then i tried to create connector for Hadoop version 1.1.0, it was successfully created. you can also download mongoDB connector for Hadoop 1.1 here.

I'm working with MongoDB + Hadoop  with Pig. will update any issues or example work flow.
 

---

working with MongoDB + Hadoop with Pig, followed  Treasury Yield Calculation example from here.

1) mongoimport of .json data
2) downloaded piggybank-0.3-amzn jar from s3://elasticmapreduce/libs/pig/0.3/piggybank-0.3-amzn.jar

while I'm executing

grunt> REGISTER /opt/bigdata/mongo-hadoop/piggybank-0.3-amzn.jar  
...... 
   grunt> date_tenyear = foreach  raw generate UnixToISO($0#'_id'), $0#'bc10Year';

have got below error


ERROR 1070: Could not resolve org.apache.pig.piggybank.evaluation.datetime.convert.UnixToISO using imports: [, org.apache.pig.builtin., org.apache.pig.impl.builtin.]

then I went through piggybank-0.3-amzn.jar, I've never seen any method like UnixToISO().

this time I registered pigybank.jar from $PIG_HOME/ contrib/piggybank/java


grunt> REGISTER /opt/bigdata/pig-0.10.0/contrib/piggybank/java/piggybank.jar;
....

after this all the steps from earlier said example worked,  can able to read/write  MongoDB through Pig.






2 comments:

Unknown said...

Which particular version of connector did you download for Hadoop 1.1?

dataanalytics said...

i was used,
mongo-hadoop-core_1.1.2-1.1.0.jar