Saturday, November 28, 2015

Apache Spark : how to start worker across cluster

Apache Spark : how to start worker across cluster
previous post - working on RPostgresql

Apache Spark cluster setup. refer if you have any issue Spark over Yarn

How to start worker node from newly added Spark slaves.?
Spark has two slave start-up script under sbin/ dir

start-slaves.sh -: to start all the worker across slaves machine. this should run from Master node

start-slave.sh -: to start Worker daemon from each and individual Slave. this should run from each slave node. Ex:

sbin/start-slave.sh spark://10.184.48.55:7077
above command need to run from slave machine. here 10.184.48.55 is where Spark Master running.
Error
shutting down Netty transport
sbin/start-slave.sh spark://10.184.48.55:7077
15/11/27 19:53:53 ERROR NettyTransport: failed to bind to /10.184.48.183:0, shutting down Netty transport
Solution
Error due to improper configuration in /etc/hosts. set SPARK_LOCAL_IP to pointng to the local worker system.

export SPARK_LOCAL_IP=127.0.0.1

OR
export SPARK_LOCAL_IP=IP_ADDR_OF_THE_SYSTEM
Next :

Friday, November 27, 2015

RPostgresql : how to pass dynamic parameter to dbGetQuery statement

RPostgresql : R and PostgreSQL Database

Working with RPostgreSQL package

How to pass dynamic / runtime parameter to dbGetQuery in RPostgrSQL ?
#use stri_paste to form a query and pass it into dbGetQuery icd = 'A09'
require(stringi)

qry <- stri_paste("SELECT * FROM visualisation.ipd_disease_datamart WHERE icd ='", icd, "'",collapse="")

rs1 <- dbGetQuery(con, qry)
Error
 
 Error in postgresqlNewConnection(drv, ...) : 
  RS-DBI driver: (cannot allocate a new connection -- 
maximum of 16 connections already opened)
library(RPostgreSQL)
drv <- dbDriver("PostgreSQL")

con <- dbConnect(drv, dbname="DBName", host="127.0.0.1",port=5432,user="yes",password="yes")
Solution
close the connection. max 16 connection can able to establish from R to PostgreSQL, if exceeds this limit, will throw error.
##list all the connections
dbListConnections(drv)

## Closes the connection
dbDisconnect(con)

## Frees all the resources on the driver
dbUnloadDriver(drv)
#OR on.exit(dbUnloadDriver(drv), add = TRUE)


How to close/drop all the connection Postgresql session.?

We can terminate the PostgreSQL connection using "pg_terminate_backend" SQL command.
In my case I was open up 16 connection using RPostgreSQL unfortunately forget to release them.
So I ended up with Max. connection exceed limit.
SELECT pg_terminate_backend(pg_stat_activity.pid) FROM pg_stat_activity WHERE client_addr = '10.184.36.131' and pid > 20613 AND pid <> pg_backend_pid();
In above query, pg_stat_activity will return list of all the active connection.
I have terminating only the connection from R session which made from the (client_addr) IP 10.184.36.181

Friday, November 20, 2015

Working with Association in R : arules and arulesViz package

working with association in R : arules package
previous post - working with RHadoop

working with Association in R using arules and arulesViz packages

when I try to visualize the top five rules,
plot(highLiftRules,method="graph",control=list(type="items"))
Error in as.double(y) : 
  cannot coerce type 'S4' to vector of type 'double'
Solution
load the library "arulesViz" into R session
library(arulesViz)
During the load the library I got error,
Error in loadNamespace(i, c(lib.loc, .libPaths()), versionCheck = vI[[i]]) : namespace "lattice" 0.20-24 is already loaded, but >= 0.20.27 is required Error: package or namespace load failed for "arulesViz"
Solution
error says that i have outed package that need to be upgraded. to know the installed package list
inst = packageStatus()$inst

inst[inst$Status != "ok", c("Package", "Version", "Status")]

#will list out all the package with installed version( not current version)

old.packages()

#list out the installed version and current version of the package.

unloadNamespace("lattice")
#then restart the R session will solve the error.

detach_package("lattice", TRUE)
#will unload the package with out restarting R session
finallly got the output of the plot


Thursday, November 19, 2015

Working with RHadoop

working with RHadoop

working with RHadoop

Error

hdfs.ls("/tamil")

Error in .jcall("java/lang/Class",
"Ljava/lang/Class;", "forName", cl, :
No running JVM detected. Maybe .jinit() would help. Error in .jfindClass(as.character(class)) : No running JVM detected. Maybe .jinit() would help.

Solution


hdfs.init()

hdfs.ls("/")

Error

hdfs.init()
sh: 1: /media/bdalab/bdalab/sw/hadoop-2.7.1/bin: Permission denied
Error in .jnew("org/apache/hadoop/conf/Configuration") : java.lang.ClassNotFoundException In addition: Warning message: running command '/media/bdalab/bdalab/sw/hadoop-2.7.1/bin classpath' had status 126
Solution


Sys.setenv(HADOOP_CMD='/hadoop-2.7.1/bin/hadoop')
Sys.setenv(JAVA_HOME='/jdk1.8.0_60/')
to know the environment varibale
Sys.getenv("HADOOP_CMD")
hdfs.init()


Next :

RHadoop integration isssues

RHadoop integration issues

installaing RHadoop package for working with R and Hadoop

Installing rjava package in R
install.packages("rJava_0.9-7.tar.gz", repos = NULL)
Error
configure: error: Java Development Kit (JDK) is missing or not registered in R Make sure R is configured with full Java support (including JDK). Run R CMD javareconf as root to add Java support to R. If you don't have root privileges, run R CMD javareconf -e to set all Java-related variables and then install rJava. ERROR: configuration failed for package 'rJava' * removing '/home/bdalab/R/x86_64-pc-linux-gnu-library/3.1/rJava'
Solution
install
sudo apt-get install r-cran-rjava
then try again. I have installed succefully in my system. still I got above error, then i checked java -version it was pointing out to openJDK instaed of Oracle java HotSpot. changed to HotSpot.
Error
library(rJava) 
##only in RStudio, In terminal its working fine.

Error : .onLoad failed in loadNamespace() 
for 'rJava', details: call: dyn.load(file, DLLpath = DLLpath, ...) error: unable to load shared object
'x86_64-pc-linux-gnu-library/3.1/rJava/libs/rJava.so': libjvm.so: cannot open shared object file:
No such file or directory Error: loading failed Execution halted ERROR: loading failed
Solution
install
sudo apt-get install r-cran-rjava

Solution
locate libjvm.so and make shared link, sudo ln -s /usr/lib/jvm/java-7-openjdk-amd64/jre/lib/amd64/server/libjvm.so /usr/lib/
Error
R CMD javareconf

/usr/lib/R/bin/javareconf: 405: 
/usr/lib/R/bin/javareconf: cannot create
 /usr/lib/R/etc/Makeconf.new: Permission denied
Solution
sudo -i R CMD javareconf
OR
R CMD javareconf JAVA=jdk1.8.0_60/jre/bin/java JAVA_HOME=jdk1.8.0_60/ JAVAC=jdk1.8.0_60/bin/javac JAR=jdk1.8.0_60/bin/jar JAVAH=jdk1.8.0_60/bin/javah

Next :