Sunday, January 4, 2015

Setup Apache Flink on cluster mode - Ubuntu

Thus is continue of previous post installing Flink on Local , In this blog will see how to Setup Apache Flink on Cluster with Hadoop, once it's done will Execute / Run Flink job on the files which is stored in HDFS.

If you are new to Hadoop, find here to setting up Hadoop Cluster

#start Hadoop cluster- HDFS
bdalab@bdalabsys: HADOOP_HOME/$ ./sbin/
#start Hadoop cluster- YARN MR2
bdalab@bdalabsys: HADOOP_HOME/$ ./sbin/
make sure all the Hadoop daemons up and running

#Download the latest Flink (matching to your Hadoop version) and un-tar the file.
bdalab@bdalabsys:/$ tar -xvzf flink-0.8-incubating-SNAPSHOT-bin-hadoop2.tgz
#rename the folder
bdalab@bdalabsys:/$ mv flink-0.8-incubating-SNAPSHOT/ flink-0.8
#move the working dir into flink_home
bdalab@bdalabsys:/$ cd flink-0.8
#similar to the HDFS configuration, edit the file $FLINK_HOME/conf/slaves and
enter the IP/Host name of each worker node.

#Enable Password less ssh from master to all the slave's
bdalab@bdalabsys:flink-0.8/$ ssh-keygen -t rsa -P ""
bdalab@bdalabsys:flink-0.8/$ ssh-copy-id -i /home/bdalab/.ssh/ bdalab@slave1
repaet the last step to as many slave mentioned in conf/slaves file.

#run flink on cluster mode
bdalab@bdalabsys:flink-0.8/$ ./bin/
Starting job manager
Starting task manager on host bdalabsys
#JobManager will started by above command. check the status by
bdalab@bdalabsys:flink-0.8/$ jps
6740 Jps
6725 JobManager
6895 TaskManager

Flink cluster mode will work on both local / HDFS. If you want to process the
data from HDFS, make sure all the HDFS daemons are up&running .