Friday, August 30, 2013

Installing PostgreSQL on windows : The "Secondary Logon" service is not running.


Error 


The "Secondary Logon" service is not running. This service is required for the installer to initialize the database. Please start this service and try again.

Solution 
    As error mentioned clearly PostgreSQL server Installation needed  the Secondary logon service to be up and run.  to start the service by


home-> Control panel -> administrative tools -> services 

then find Secondary Logon service. Start this service by right click.

continue with your Installation.
     

Thursday, August 29, 2013

MongoDB-Hadoop integration and Data processing with Apache Pig. Running pig script with Node.js


I have a weather data in MongoDB database, As a experiment weather data  porting into hadoop environment and processing using Pig.

By using mongodb-hadoop connector, from Pig one can able to read from / write /to mongoDB. once output written into MongoDB from browser user can able to view the result data.

I've created simple web application with MongoDB and Node.js using Express framework. also executing Pig script with in Node.



trades collection data

>  use week6                                      
> db.trades.findOne()
{
 "_id" : ObjectId("51b05fbce3600f7b48448eda"),
 "ticker" : "abcd",
 "time" : ISODate("2012-03-03T04:24:13.003Z"),
 "price" : 110,
 "shares" : 200,
 "ticket" : "z135" // this key used for group in pig script
}



Pig script


root@boss[bin]#vi cntWthr.pig

.....-- MongoDB Java driver
REGISTER  /opt/hadoop-1.1.0/lib/mongo-2.10.1.jar;
-- Core Mongo-Hadoop Library
REGISTER /opt/bigdata/mongo-hadoop/core/target/mongo-hadoop-core_1.1.2-1.1.0.jar;
-- mongo-hadoop pig support
REGISTER /opt/bigdata/mongo-hadoop/pig/target/mongo-hadoop-pig_1.1.2-1.1.0.jar;

trades = LOAD 'mongodb://localhost:27017/week6.trades' using com.mongodb.hadoop.pig.MongoLoader; 
grp = GROUP trades by $0#'ticket';
cnt = FOREACH grp GENERATE group,COUNT(trades);
--dump cnt;
STORE cnt INTO 'mongodb://localhost:27017/mongo_hadoop.yield_historical.outt' USING com.mongodb.hadoop.pig.MongoInsertStorage('group:float,cnt:int', 'group');


in above script

  • LOAD the MongoDB data from week6 database and trades collection.
  • GROUP the loaded data based on the key ticket
  • GENERATE the COUNT for each ticket 
  • Instead of display the result in console or store into hdfs, here I'm STORE back to MongoDB  mongo_hadoop and collection yield_historical.
Node.js 

root@boss[bin]#vi mongoHadoopNode.js


// This script must be run from PIG_HOME/bin and file cntWthr.pig must be exist in same path.
var express = require('express'),     
    app = express(),
    cons = require('consolidate'), 
    mongoClient = require('mongodb').MongoClient,
    Server  = require('mongodb').Server;
// Configuring view template
app.engine('html',cons.swig);
app.set('view engine','html');
app.set('views', __dirname + "/views");
// Running pig script file var spawn = require('child_process').spawn, runPig = spawn('pig',['cntWthr.pig']); // Handling the output runPig.stdout.on('data',function(data){ console.log('stdout : '+data); }); runPig.stderr.on('data',function(data){ console.log('stderr : '+data ); }); app.get('/',function(req,res){ mongoClient.connect('mongodb://localhost:27017/mongo_hadoop',function(err,db) { if(err) throw err; db.collection('yield_historical.outt').findOne({},function(err,doc){ res.render('template',{'Group':doc.group,'Value':doc.val_0}); db.close(); });; }); }); app.get('*',function(req,res){ res.send("Page Not Found !!!! "); }); mongo.open(function(err,mongo){ if(err) throw err; app.listen(8000); console.log("Express server started successfully localhost:8000") });

simple Html page ( view/template.html )

MongoDB-Hadoop integraton and Job processing with PIG 
Group : {{Group}}  Value : {{Value}}

Run node.js 
root@boss[bin]#node mongoHadoopNode.js
Express server started successfully localhost:8000

MongoDB new collection created  under mongo_hadoop database
>  use mongo_hadoop
> db.yield_historical.outt.findOne()
{ "_id" : ObjectId("521f0ebf908dfe1853af7c01"), "group" : "z135", "val_0" : NumberLong(1667) }

enter the IP in browser, which will show the output like
http://localhost:8000

MongoDB-Hadoop integraton and Job processing with PIG
Group : z447 
Value : 834


Executing Apache Pig script from Node.js

my previous post help to create mongodb - hadoop connector.

once you have done,
  •  put mongo-hadoop-core_1.1.2-1.1.0 connector  into $HADOOP_HOME/lib
  •  Download latest version of MongoDB java driver and put into $HADOOP_HOME/lib 
in both case the node script i.e runPig.js file should  be in $PIG_HOME/bin

method 1
root@boss:/opt/bigdata/pig-0.11.1/bin>vi runPig.js

var spawn = require('child_process').spawn;
var runPig = spawn('pig',['cntWthr.pig']);
runPig.stdout.on('data',function(data){
console.log('stdout : '+data);
});
runPig.stderr.on('data',function(data){
console.log('stderr : '+data + " process Home : "+process.env.HOME);
});  

root@boss:/opt/bigdata/pig-0.11.1/bin>node runPig.js

method 2

root@boss:/opt/bigdata/pig-0.11.1/bin>vi runPig.js
var sys = require('sys');
var exec = require('child_process').exec;
function puts(error, stdout, stderr) { sys.puts(stdout) }
exec("pig -f cntWthr.pig", puts);

root@boss:/opt/bigdata/pig-0.11.1/bin>node runPig.js

method 2 will not show any log in console, but method 1 will show the log as it execute from pig environment.

Tuesday, August 27, 2013

Issues with MongoDB and Hadoop Integration


I just tried create connector for  mongoDB NoSQL with Hadoop 0.21.0

while i'm doing so got few error which is solvable and hadoop-core unresolved dependency which is unsolvable. Then I decided to create connector for Hadoop 1.1.0.

After downloaded from git, did

ERROR 1)


root@boss[mongo-hadoop]# ./sbt package


java.lang.RuntimeException: Hadoop Release '%s' is an invalid/unsupported release.  Valid entries are in 0.21.0
 at scala.sys.package$.error(package.scala:27)
 at MongoHadoopBuild$$anonfun$streamingSettings$6$$anonfun$apply$8.apply(MongoHadoopBuild.scala:176)
 at MongoHadoopBuild$$anonfun$streamingSettings$6$$anonfun$apply$8.apply(MongoHadoopBuild.scala:176)
 at scala.collection.MapLike$class.getOrElse(MapLike.scala:122)
 at scala.collection.immutable.HashMap.getOrElse(HashMap.scala:38)
 at MongoHadoopBuild$$anonfun$streamingSettings$6.apply(MongoHadoopBuild.scala:176)
 at MongoHadoopBuild$$anonfun$streamingSettings$6.apply(MongoHadoopBuild.scala:175)
 at scala.Function1$$anonfun$compose$1.apply(Function1.scala:49)
 at scala.Function1$$anonfun$compose$1.apply(Function1.scala:49)
 at sbt.EvaluateSettings$$anonfun$sbt$EvaluateSettings$$single$1.apply(INode.scala:159)
 at sbt.EvaluateSettings$$anonfun$sbt$EvaluateSettings$$single$1.apply(INode.scala:159)
 at sbt.EvaluateSettings$MixedNode.evaluate0(INode.scala:177)
 at sbt.EvaluateSettings$INode.evaluate(INode.scala:132)
 at sbt.EvaluateSettings$$anonfun$sbt$EvaluateSettings$$submitEvaluate$1.apply$mcV$sp(INode.scala:64)
 at sbt.EvaluateSettings.sbt$EvaluateSettings$$run0(INode.scala:73)
 at sbt.EvaluateSettings$$anon$3.run(INode.scala:69)
 at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
 at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
 at java.lang.Thread.run(Thread.java:619)
[error] Hadoop Release '%s' is an invalid/unsupported release.  Valid entries are in 0.21.0
[error] Use 'last' for the full log.

SOLUTION
I simply changed in build.sbt
"0.21.0"  to "0.21"



root@boss[mongo-hadoop]#vi build.sbt 
.......
.......
hadoopRelease in ThisBuild := "0.21"


ERROR 2)
then I executed the same

root@boss[mongo-hadoop]# ./sbt package


module not found: org.apache.hadoop#hadoop-core;0.21.0
[warn] ==== local: tried
[warn]   /root/.ivy2/local/org.apache.hadoop/hadoop-core/0.21.0/ivys/ivy.xml
[warn] ==== Simile Repo at MIT: tried
[warn]   http://simile.mit.edu/maven/org/apache/hadoop/hadoop-core/0.21.0/hadoop-core-0.21.0.pom
[warn] ==== Cloudera Repository: tried
[warn]   https://repository.cloudera.com/artifactory/cloudera-repos/org/apache/hadoop/hadoop-core/0.21.0/hadoop-core-0.21.0.pom
[warn] ==== Maven.Org Repository: tried
[warn]   http://repo1.maven.org/maven2/org/apache/hadoop/hadoop-core/0.21.0/hadoop-core-0.21.0.pom
[warn] ==== releases: tried
[warn]   https://oss.sonatype.org/content/repositories/releases/org/apache/hadoop/hadoop-core/0.21.0/hadoop-core-0.21.0.pom
[warn] ==== public: tried
[warn]   http://repo1.maven.org/maven2/org/apache/hadoop/hadoop-core/0.21.0/hadoop-core-0.21.0.pom
[info] Resolving org.specs2#specs2-scalaz-core_2.9.2;6.0.1 ...
[warn]  ::::::::::::::::::::::::::::::::::::::::::::::
[warn]  ::          UNRESOLVED DEPENDENCIES         ::
[warn]  ::::::::::::::::::::::::::::::::::::::::::::::
[warn]  :: org.apache.hadoop#hadoop-core;0.21.0: not found
[warn]  ::::::::::::::::::::::::::::::::::::::::::::::
sbt.ResolveException: unresolved dependency: org.apache.hadoop#hadoop-core;0.21.0: not found

SOLUTION

 surfing around web, concluded  issues related to port 443. even after 443 port is opened error remains.

then i gone through all the link, I never find any hadoop-core-0.21.0.jar file.

---

then i tried to create connector for Hadoop version 1.1.0, it was successfully created. you can also download mongoDB connector for Hadoop 1.1 here.

I'm working with MongoDB + Hadoop  with Pig. will update any issues or example work flow.
 

---

working with MongoDB + Hadoop with Pig, followed  Treasury Yield Calculation example from here.

1) mongoimport of .json data
2) downloaded piggybank-0.3-amzn jar from s3://elasticmapreduce/libs/pig/0.3/piggybank-0.3-amzn.jar

while I'm executing

grunt> REGISTER /opt/bigdata/mongo-hadoop/piggybank-0.3-amzn.jar  
...... 
   grunt> date_tenyear = foreach  raw generate UnixToISO($0#'_id'), $0#'bc10Year';

have got below error


ERROR 1070: Could not resolve org.apache.pig.piggybank.evaluation.datetime.convert.UnixToISO using imports: [, org.apache.pig.builtin., org.apache.pig.impl.builtin.]

then I went through piggybank-0.3-amzn.jar, I've never seen any method like UnixToISO().

this time I registered pigybank.jar from $PIG_HOME/ contrib/piggybank/java


grunt> REGISTER /opt/bigdata/pig-0.10.0/contrib/piggybank/java/piggybank.jar;
....

after this all the steps from earlier said example worked,  can able to read/write  MongoDB through Pig.






Thursday, August 15, 2013

Data transfer from one table in a PostgreSQL database to the another table in a different database.


Transfer Data between databases with PostgreSQL
      
      pg_dump sourceDB  -t fromTbl -c -s | psql -h 192.16.3.2 targetDB;

 Problem with pg_dump
    pg_dump: server version: 9.x.x ; pg_dump version: 9.y.y
    pg_dump: aborting because of server version mismatch

above command well suited if both server running on same version of PostgreSQL.
 If Source & Target server  are running in different version, have to use COPY command. 
 copy data from one table in a PostgreSQL database to the corresponding table in a different database running on different server.

i.e cross database copy/transfer data in PostgreSQL,

general syntax  for cross database copy command:

psql -c "copy (select list of column  from table_name ) to stdin " dbanme | psql -c "table_name(specify the column ) from stdout " targetDB

  • Target table must be exist
  • Table/Relation may differ in name
  • if source and target databases are following different table schema  
  • Want to transfer only few column from source database table to  target database 
  • copy/transfer few column  between databases 
  • Server may run different version of PostgreSQL database.
  • cross database data transfer  

example :
in sourceDB table employee(eid,ename,esalary, edesignation )
in targetDB table staff(sid,sname,spay)
 now we would like to transfer data between this two databases 


psql -c " copy ( select eid, ename, esalary from employee) to stdin " sourceDB | psql -c " copy staff(sid,sname,spay) from stdout " targetDB


data from employee table in source database  copy/transferred to  staff table in targetDB. 
 note :
  • single command will do both backup from one database and restore  to another database
  • we just moved only 3 column from employee table not all  
  • data type of the two table must be sameif source and target  databases in different server use -h option in psql command
  • ex :
    • psql -h 127.0.0.1 -c "copy (select eid,ename from emp ) to stdin " sourceDB | psql -h 192.168.37.2 -c "copy staff(sid,sname) from stdout " targetDB