Monday, October 3, 2016

Hillary Clinton VS Donald Trump: Twitter statistics

US President election 2016 : twitter battle Hillary vs trump

HillaryClinton VS DonaldTrump Twitter stats.

 Name  @HillaryClinton  Mountain View  @realDonaldTrump&nbspMountain View
  Followers List  9336300  12023174
Hashtag mostly used - #MAGA  #DEBATENIGHT  #DEBATES2016
Number of Celebrities who follows 37 25
Most influencing person, who follows @MargaretAtwood @CordeiroRick
Most re-tweeted tweet HillaryClinton: This is...unhinged, even for Trump. A few notes. "realDonaldTrump:For those few people knocking me for tweeting at three o'clock in the morning, at least you know I will be there, awake, to answer the call!"
Source of tweet Twitter Web Client
 Twitter for iPhone
Twitter for iPhone
Twitter for Android
Most frequent words trump, hillary, together makes taxes paid  hillary, join mega debate
Mention @realDonaldTrump @AmeriCorps     @BernieSanders  @NYTimes @Peacecorps @CNN 
@HillaryClinton   @donlemon,   @foxandfriends @GovernorSununu


Saturday, October 1, 2016

Weka Hadoop Integration - weka read/write data from HDFS

Weka Hadoop Integration - weka read/write data from HDFS
Weka Hadoop Integration using distributedWekaHadoop package

In weka, Tools --> PackageManager search "distributedWekaHadoop" and install the packages.

Now go back to your weka KnowledgeFlow, you can find HDFSLoader and HDFSSaver in DataLoder and DataSink portion.

by using which you can read / write data to / from HDFS

Thursday, April 7, 2016

Top 10 Deep Learning Tools for R/Python/Java/Matlab/C++

Top 10 Deep Learning Tools for R/Python/Java/Matlab/C++

Deep Learning Tools For R

H2O - Parallel distributed machine learning algorithms such as generalized linear models, gradient boosting machines, random forests, and neural networks (deep learning) within various cluster environments

Deep Learning Tools For R and Python

MXNET - Deep learning framework designed for both efficiency and flexibility. It allows you to mix the flavours of deep learning programs together to maximize efficiency and your productivity.

Neuraldesigner - Neural Designer is an innovative deep learning tool for predictive analytics. Deep learning algorithms are the most powerful method to discover intricate relationships, recognize complex patterns or predict current trends in your data.

Deep Learning Tools For Python

Theano - Allows you to define, optimize, and evaluate mathematical expressions involving multi-dimensional arrays efficiently.

Caffe - Caffe is a deep learning framework made with expression, speed, and modularity in mind. matcaffe – is the caffe package in caffe/matlab in which you can integrate Caffe in your Matlab code.

TensorFlow - Provides a straightforward way for users to train computers to perform tasks by feeding them large amounts of data. The software incorporates various methods for efficiently building and training simulated “deep learning” neural networks across different computer hardware.

Deep Learning Tools For Java

DL4J - The first commercial-grade, open-source, distributed deep-learning library written for Java and Scala. Integrated with Hadoop and Spark, DL4J is designed to be used in business environments, rather than as a research too.

Deep Learning Tools For C++/Scripting

Opennn - It is intended for advanced users, with high C++ and machine learning skills.

Torch - Torch is a scientific computing framework with wide support for machine learning algorithms that puts GPUs first. It is easy to use and efficient, thanks to an easy and fast scripting language, LuaJIT, and an underlying C/CUDA implementation.

Elektronn - Is a deep learning toolkit that makes powerful Neural Networks accessible to scientists outside of the machine learning community. Written in Python and based on the Theano framework.

Thursday, March 31, 2016

taskscheduleR: Scheduling R script via R and Rstudio

Scheduling R script via R / Rstudio

Scheduling R script via R.?

taskscheduleR - It allow users to automate R script on specific timepoints from R / Rstudio itself. No need of windows Task scheduling or Linux Cron to automate R script.

Installation of taskscheduleR


If you want the RStudio add-in to work, also install miniUI and shiny

install.packages('miniUI') install.packages('shiny')
Open Rscheduler window

Click the Addins menu added in Rstudio

You can also open Rscheuler window in Browser by

hit below command in R terminal,

you will notify with "Listening on " and the same can be viewed in browser

Friday, March 25, 2016

Analyze Data to Identify type of Readmissions using PostgreSQL

Analyze Data to Identify Causes of Readmissions using PostgreSQL

What is patient readmission

Normally Readmission is - if a patient returns within 30 days of previous discharge.

reason for readmission

Multiple factors contribute to avoidable hospital readmission: they may result from poor quality care or from poor transitions between different providers and care settings.

The problem of readmission to the hospital is receiving increased attention as a potential way to address problems in quality of care, cost of care, and care transition

readmission table structure

After cleansed and identified the required attribute to get the readmission details, my entity model look like this

patient_id disease adm_date dis_date
1069 Oncology 6/17/2013 6:51 6/21/2013 7:15
1078 Neuro 7/18/2013 12:10 7/20/2013 08:12
1082 Ortho 7/19/2013 12:10 7/22/2013 08:12
1085 Cardiothor 8/25/2013 12:10 8/27/2013 08:12
1085 Cardiothor 9/13/2013 12:10 9/16/2013 08:12

Now, we have to write a query to get the list of readmission details. Condition is, current admit date is between the last discharge date and 30 days from then.

readmission query in postgresql

Select count(*) "Readmission Count", disease
From readmission_view v
Where Exists
Select * From readmission_view
Where patient_id = v.patient_id and
v.adm_date between dis_date and dis_date + 30
group by disease
order by count(*)

readmission result

disease Readmission Count
Ortho 878
Neuro 567
Oncology 155
Using the result optioned from the query to find the percentage of patient readmission. based on the result need extra care on the disease.

Friday, March 18, 2016

ERROR invalid input syntax for type date SQL state: 22007

ERROR invalid input syntax for type date SQL state: 22007

SQL state: 22007

Type casting varchar to date in where clause

select * from patient_history where
where admissiondate::DATE > releasedate::DATE


ERROR:invalid input syntax for type date:"" ********** Error **********

ERROR:invalid input syntax for type date:""
SQL state: 22007

In my case, type casting for admissiondate working fine but not for releasedate. Then i spent few time for the field releasedate then found few empty / null values in releasedate column.

select * from patient_history where
releasedate is null OR releasedate =''

I removed empty entry from the releasedate , then the same query for type casting working fine.

select * from patient_history where
v.admissiondate::DATE > releasedate::DATE

Friday, March 4, 2016

Installing TeamPostgreSQL on 64 bit Ubuntu 14.x OS

Installing TeamPostgreSQL on 64 bit Ubuntu 14.x OS

TeamPostgreSQL - PostgreSQL Web Admin GUI Tools

Installing TeamPostgreSQL on 64 bit Ubuntu 14.x OS
boss@solai:~$ ./


Unpacking JRE ...
Preparing JRE ...
./ 256: ./ bin/unpack200: not found Error unpacking jar files. Aborting. You might need administrative priviledges for this operation.

boss@solai:~$ sudo dpkg --add-architecture i386

boss@solai:~$ sudo apt-get update

boss@solai:~$ sudo apt-get install libc6:i386 libncurses5:i386 libstdc++6:i386

now 32 bit lib installed in your system. try install..


boss@solai:~$ ./

Unpacking JRE ...
Preparing JRE ...
Starting Installer ...
Could not display the GUI. This application needs access to an X Server.
If you have access there is probably an X library missing.
******************************************************************* You can also run this application in console mode without access to an X server by passing the argument -c *******************************************************************
An error occurred:
/opt/sw/ cannot open shared object file: No such file or directory
Error log: /tmp/install4jError8210847315454692406.log

Install again by adding -c option

boss@solai:~$ ./ -c

Wednesday, March 2, 2016

Rserve - TCP/IP server which allows other programs to use facilities of R

Rserve - TCP/IP server which allows other programs to use facilities of R

Rserve - is a TCP/IP server which allows other programs to use facilities of R from various languages without the need to initialize R or link against R library. Every connection has a separate workspace and working directory. Client-side implementations are available for popular languages such as C/C++, PHP and Java. Rserve supports remote connection, authentication and file transfer.

Installing and initiate Rserve
> install.packages('Rserve')

> library(Rserve)

> Rserve()


> Rserve()

Rserv started in daemon mode.
##> SOCK_ERROR: bind error #98(address already in use)

boss@passive:~$ ps faux | grep Rserve

will list the Rserv started daemon.

Tuesday, March 1, 2016

Big Data / Data Analytics Jobs

Big Data Hadoop Jobs in India and around the world
Big Data Analyst   5+ years   
May, 2016    Chennai, India
Big Data - Principal Software Engineer   5+ years   
May 05, 16    Humana, Irving, Texas / USA
Hadoop and Spark Developer   5-7 years   
May 05, 16    CSC india, Dubai/ UAE
Data Specialist: Advanced Analytics   4-10 years   
May 01, 16    IBM, Bangalore, India
Hadoop Developer (DWH)   5-9 years    April 12, 2016
Csi Software Pvt Ltd , Chennai, TamilNadu, India
Sr.Data Scientist      April 10, 16
Allstate., Northbrook, IL US.
Sr.HADOOP DEVELOPER   4-8 years    April 03, 16
Swathi Business Solutions, Chennai, India,(KL, Malaysia)
Java Hadoop Lead (Big Data)   7-10 years    April 03, 16
Innominds, Hyderabad, India
Big Data developer / Intern for Big Data     April 03, 2016
Frgma Data, Bengaluru, India
Analyst 1 - Apps Prog      April 01, 2016
Chennai, Tamil Nadu, India
Hadoop Data Engineer      Mar 31, 2016
Chennai, Tamil Nadu, India
Bigdata Developer      March 31, 2016
Chennai, India
In Association with

Thursday, February 25, 2016

RPostgreSQL Data analytics on PostgreSQL data using R

RPostgreSQL - Data analytics on PostgreSQL data from R

Data analytics on PostgreSQL data using R. Working with R and PostgreSQL for large scale data analytics

Installing RPostgreSQL

compilation terminated. /usr/lib/R/etc/Makeconf:128: recipe for target 'RS-PQescape.o' failed make: *** [RS-PQescape.o] Error 1 ERROR: compilation failed for package 'RPostgreSQL' * removing '/home/boss/R/x86_64-pc-linux-gnu-library/3.1/RPostgreSQL'
RS-PostgreSQL.h:23:26: fatal error: libpq-fe.h: No such file or directory


RPostgreSQL require libpq-dev.

libpq-dev is a set of library functions that allow client programs (for our case its 'R') to pass queries to the PostgreSQL backend server and to receive the results of these queries.

So we have to install libpg-dev on OS not from R terminal

solai@server# sudo apt-get install libghc-postgresql-libpq-dev

then install RPostgreSQL inside R session
> install.packages('RPostgreSQL')

Saturday, January 30, 2016

Easy way to fix error 3194 in iTunes when you restore or update iPhone6 or iPad

Easy way to fix error 3194 in iTunes when you restore or update iPhone6 or iPad
previous post - XGBoost in R

Recently I gifted iPhone6s which was used by my brother. I thought it would be better to factory restore before I start using. When I do so, I got 3194 error

simple way to fix error 3194 in iTunes when you restore or update iPhone
the iphone "iphone" could not be restored unknown error (3194) occurred


if you get this error in your iTunes, DO this and try again

1) Check your iTunes version. make sure you are using latest versions of iTunes

2) Disable firewall or Anti virus security software for at-least 10 min

hope now you will get the things done. if not do one more step

3) check the system "host" file
    i) Open host file in C:\Windows\System32\Drivers\etc with administrator privilege

    ii) Find any entries like "" (example and remove the line / comment the line by adding # in front of the line

    iii) Restart the system.

In most cases you will get resolved by doing first two steps.

Wednesday, January 13, 2016

XGBoost in R. Error in xgb.DMatrix and can not open file -unknown- and Error in xgb.DMatrix

XGBoost in R. Error in xgb.DMatrix and can not open file -unknown- and Error in xgb.DMatrix

XGBoost in R

Preparing data model for XGBoost
Error in xgb.DMatrix(data, label = label) : can not open file "-unknown-"


Check if your data has character or factor variables and try to convert them to numerical.


Predict using XGBoost model
red <- predict(bst, xTest)

Error in xgb.DMatrix(newdata) : xgb.DMatrix: does not support to construct from list


convert the test data into matrix type which is accpected by xgboost

pred <- predict(bst, as.matrix(xTest))

dummyVars function allows us to create dummy variables
in other words it translates text data into numerical data

dummyVars("~ gender", data = df_all)

Error: could not find function "dummyVars"



Related posts :

XGBoost in R : Error in xgb.iter.update

code of conduct : Top 9 Rule of Data Science Professional

Wednesday, January 6, 2016

RHadoop - Running Hadoop MapReduce from R

RHadoop - Running Hadoop MapReduce from R

Running Hadoop Mapreduce From R. Working with R and Hadoop for large scale data analytics

Running Mapreduce job
m <- mapreduce(InputPath, input.format = 'csv', map = mapFunction, reduce = reduceFunction )

Error: java.lang.RuntimeException: PipeMapRed.waitOutputThreads(): subprocess failed with code 1


Error seems map not find the file location,
I was trying to reading File from local but by backend configured with Local

rmr.options(backend = "local")

Reading File from Hadoop in R session
f = hdfs.line.reader("/HDFS/GdpData.csv");

Error in .jcheck(silent = TRUE) :
No running JVM detected. Maybe .jinit() would help.


Run .jinit() to initialize the JVM,


Tuesday, January 5, 2016

RHadoop - rJava installtion Error and solution

RHadoop - Error and solution

Installaing RHadoop package. Working with R and Hadoop for large scale data analytics

Install rJava package from local system

install.packages("/home/bdlnn/Downloads/rJava_0.9-7.tar.gz", repos = NULL)

configure: error: Java interpreter '/usr/lib/jvm/default-java/jre/bin/java' does not work ERROR: configuration failed for package "rJava" * removing "/usr/local/lib/R/site-library/rJava"



solai@vm1$ sudo R CMD javareconf JAVA_HOME='/home/bdlnn/Software/jdk1.7.0_79'

dont put '/' in end of JAVA_HOME Path

You can also try the alternate options
solai@vm1$ sudo apt-get install r-cran-rjava

inside R

Saturday, January 2, 2016

will Tamil nadu political parties get twitter hashflag for TN2016 social election campaign

will Tamil nadu political parties get twitter hashflag for TN2016 social election campaign

will Tamil nadu political parties get twitter hashflag for TN2016 social election campaign

According to twitter
     Hashflags images that appear after a #hashtag, and are enabled on Twitter for specific occasions or events. Not all hashtags of course, only the ones it deems worthy of such treatment.Sometimes referred to as custom Twitter Emojis.

some of Hashflags for india based event's
#makeinindia #happydiwali #ipl
makeinindia. happydiwali. IPL.

last year, twitter has launched hashflags for political parties of UK and Spanish general election.

Here is the population comparison of Spain and UK with Tamil Nadu state of India.

Spain UK TamilNadu
47,847,339 63,843,856 76,656,206

TamilNadu has marginally higher population than these countries. Hope twitter will create hashflag for the parties like #AIADMK, #DMK, #DMDK, #PMK, #sagayam!!???. and brighten up election coverage.