Saturday, November 28, 2015

Apache Spark : how to start worker across cluster

Apache Spark : how to start worker across cluster
previous post - working on RPostgresql

Apache Spark cluster setup. refer if you have any issue Spark over Yarn

How to start worker node from newly added Spark slaves.?
Spark has two slave start-up script under sbin/ dir -: to start all the worker across slaves machine. this should run from Master node -: to start Worker daemon from each and individual Slave. this should run from each slave node. Ex:

sbin/ spark://
above command need to run from slave machine. here is where Spark Master running.
shutting down Netty transport
sbin/ spark://
15/11/27 19:53:53 ERROR NettyTransport: failed to bind to /, shutting down Netty transport
Error due to improper configuration in /etc/hosts. set SPARK_LOCAL_IP to pointng to the local worker system.


Next :

Friday, November 27, 2015

RPostgresql : how to pass dynamic parameter to dbGetQuery statement

RPostgresql : R and PostgreSQL Database

Working with RPostgreSQL package

How to pass dynamic / runtime parameter to dbGetQuery in RPostgrSQL ?
#use stri_paste to form a query and pass it into dbGetQuery icd = 'A09'

qry <- stri_paste("SELECT * FROM visualisation.ipd_disease_datamart WHERE icd ='", icd, "'",collapse="")

rs1 <- dbGetQuery(con, qry)
 Error in postgresqlNewConnection(drv, ...) : 
  RS-DBI driver: (cannot allocate a new connection -- 
maximum of 16 connections already opened)
drv <- dbDriver("PostgreSQL")

con <- dbConnect(drv, dbname="DBName", host="",port=5432,user="yes",password="yes")
close the connection. max 16 connection can able to establish from R to PostgreSQL, if exceeds this limit, will throw error.
##list all the connections

## Closes the connection

## Frees all the resources on the driver
#OR on.exit(dbUnloadDriver(drv), add = TRUE)

How to close/drop all the connection Postgresql session.?

We can terminate the PostgreSQL connection using "pg_terminate_backend" SQL command.
In my case I was open up 16 connection using RPostgreSQL unfortunately forget to release them.
So I ended up with Max. connection exceed limit.
SELECT pg_terminate_backend( FROM pg_stat_activity WHERE client_addr = '' and pid > 20613 AND pid <> pg_backend_pid();
In above query, pg_stat_activity will return list of all the active connection.
I have terminating only the connection from R session which made from the (client_addr) IP