Saturday, December 27, 2014

Apache Flink setup on ubuntu

Apache Flink setup on ubuntu
    Apache Flink

  • Compines feature from RDBMS ( query optimization capabilities)
    and MapReduce (scalability)
  • Write like a programming language, execute like a database
  • Like Spark, Flink execution engine that aggressively uses
    in-memory execution, but very gracefully degrades to
    disk-based execution when memory is not enough
  • Flink support filesystems : HDFS, HBase, Local FS, S3, JDBC.
  • Run on Local, Cluster and YARN
In this blog will see how to Setup Apache Flink on local mode,
once it's done will Execute / Run Flink job on the files which is stored in HDFS.

#Download the latest Flink and un-tar the file.
bdalab@bdalabsys:/$ tar -xvzf flink-0.8-incubating-SNAPSHOT-bin-hadoop2.tgz
#rename the folder
bdalab@bdalabsys:/$ mv flink-0.8-incubating-SNAPSHOT/ flink-0.8
#move the working dir into flink_home
bdalab@bdalabsys:/$ cd flink-0.8
#start Flink on local mode
bdalab@bdalabsys:flink-0.8/$ ./bin/
#JobManager will started by above command. check the status by
bdalab@bdalabsys:flink-0.8/$ jps
6740 Jps
6725 JobManager
#JobManager web UI will started by default on port 8081 Now we have everything up & running. will try to Run job.
as we all are aware a familier WordCount example in distributed
computing, lets begin with WordCount in Flink

#*-WordCount.jar file available under $FLINK_HOME/examples
bdalab@bdalabsys:flink-0.8/$ bin/flink run examples/flink-java-examples-0.8-incubating-SNAPSHOT-WordCount.jar /home/ipPath /home/flinkop
Above command, will run on file from local and store the result back to
local file system.

#If we want to process the same in HDFS
bdalab@bdalabsys:flink-0.8/$ bin/flink run examples/flink-java-examples-0.8-incubating-SNAPSHOT-WordCount.jar hdfs://localhost:9000/ip/tvvote hdfs://localhost:9000/op/
make sure HDFS daemons are up&running . else will get an error.
#bin/flink has 4 major Action.
  • run #runs a program
  • info #displays information about a program.
  • list #lists running and finished programs. -r & -s
  • cancel #cancels a running program. -i
#Display the running JobID by
bdalab@bdalabsys:flink-0.8/$bin/flink list -r -s

In Next blog will explain you the Setup Flink on Cluster mode


Addison Conroy said...

I really, really like bin lookup. Cited by many as the single most important influence on post modern micro eco compartmentalize, there are just not enough blues songs written about bin lookup. It is estimated that that bin lookup is thought about eight times every day by those most reliant on technology, which I can say no more about due to legal restrictions. With the primary aim of demonstrating my considerable intellect I will now demonstrate the complexity of the many faceted issue that is bin lookup.

Softql said...

I really better blog rf services