Sunday, January 4, 2015

Setup Apache Flink on cluster mode - Ubuntu

Configure / Setup Apache Flink on Hadoop cluster - ubuntu
Thus is continue of previous post installing Flink on Local , In this blog will see how to Setup Apache Flink on Cluster with Hadoop, once it's done will Execute / Run Flink job on the files which is stored in HDFS.

If you are new to Hadoop, find here to setting up Hadoop Cluster

#start Hadoop cluster- HDFS
bdalab@bdalabsys: HADOOP_HOME/$ ./sbin/start-dfs.sh
#start Hadoop cluster- YARN MR2
bdalab@bdalabsys: HADOOP_HOME/$ ./sbin/start-yarn.sh
make sure all the Hadoop daemons up and running

#Download the latest Flink (matching to your Hadoop version) and un-tar the file.
bdalab@bdalabsys:/$ tar -xvzf flink-0.8-incubating-SNAPSHOT-bin-hadoop2.tgz
#rename the folder
bdalab@bdalabsys:/$ mv flink-0.8-incubating-SNAPSHOT/ flink-0.8
#move the working dir into flink_home
bdalab@bdalabsys:/$ cd flink-0.8
#similar to the HDFS configuration, edit the file $FLINK_HOME/conf/slaves and
enter the IP/Host name of each worker node.

#Enable Password less ssh from master to all the slave's
bdalab@bdalabsys:flink-0.8/$ ssh-keygen -t rsa -P ""
bdalab@bdalabsys:flink-0.8/$ ssh-copy-id -i /home/bdalab/.ssh/id_dsa.pub bdalab@slave1
repaet the last step to as many slave mentioned in conf/slaves file.

#run flink on cluster mode
bdalab@bdalabsys:flink-0.8/$ ./bin/start-cluster.sh
....
Starting job manager
Starting task manager on host bdalabsys
.....
#JobManager will started by above command. check the status by
bdalab@bdalabsys:flink-0.8/$ jps
6740 Jps
6725 JobManager
6895 TaskManager

Flink cluster mode will work on both local / HDFS. If you want to process the
data from HDFS, make sure all the HDFS daemons are up&running .

Saturday, December 27, 2014

Apache Flink setup on ubuntu

Apache Flink setup on ubuntu
    Apache Flink

  • Compines feature from RDBMS ( query optimization capabilities)
    and MapReduce (scalability)
  • Write like a programming language, execute like a database
  • Like Spark, Flink execution engine that aggressively uses
    in-memory execution, but very gracefully degrades to
    disk-based execution when memory is not enough
  • Flink support filesystems : HDFS, HBase, Local FS, S3, JDBC.
  • Run on Local, Cluster and YARN
In this blog will see how to Setup Apache Flink on local mode,
once it's done will Execute / Run Flink job on the files which is stored in HDFS.

#Download the latest Flink and un-tar the file.
bdalab@bdalabsys:/$ tar -xvzf flink-0.8-incubating-SNAPSHOT-bin-hadoop2.tgz
#rename the folder
bdalab@bdalabsys:/$ mv flink-0.8-incubating-SNAPSHOT/ flink-0.8
#move the working dir into flink_home
bdalab@bdalabsys:/$ cd flink-0.8
#start Flink on local mode
bdalab@bdalabsys:flink-0.8/$ ./bin/start-local.sh
#JobManager will started by above command. check the status by
bdalab@bdalabsys:flink-0.8/$ jps
6740 Jps
6725 JobManager
#JobManager web UI will started by default on port 8081 Now we have everything up & running. will try to Run job.
as we all are aware a familier WordCount example in distributed
computing, lets begin with WordCount in Flink

#*-WordCount.jar file available under $FLINK_HOME/examples
bdalab@bdalabsys:flink-0.8/$ bin/flink run examples/flink-java-examples-0.8-incubating-SNAPSHOT-WordCount.jar /home/ipPath /home/flinkop
Above command, will run on file from local and store the result back to
local file system.

#If we want to process the same in HDFS
bdalab@bdalabsys:flink-0.8/$ bin/flink run examples/flink-java-examples-0.8-incubating-SNAPSHOT-WordCount.jar hdfs://localhost:9000/ip/tvvote hdfs://localhost:9000/op/
make sure HDFS daemons are up&running . else will get an error.
#bin/flink has 4 major Action.
  • run #runs a program
  • info #displays information about a program.
  • list #lists running and finished programs. -r & -s
  • cancel #cancels a running program. -i
#Display the running JobID by
bdalab@bdalabsys:flink-0.8/$bin/flink list -r -s


In Next blog will explain you the Setup Flink on Cluster mode

Labels

Adsense