- Apache Flink
- Compines feature from RDBMS ( query optimization capabilities)
and MapReduce (scalability) - Write like a programming language, execute like a database
- Like Spark, Flink execution engine that aggressively uses
in-memory execution, but very gracefully degrades to
disk-based execution when memory is not enough - Flink support filesystems : HDFS, HBase, Local FS, S3, JDBC.
- Run on Local, Cluster and YARN
In this blog will see how to Setup Apache Flink on local mode,
once it's done will Execute / Run Flink job on the files which is stored in HDFS.
#Download the latest Flink and un-tar the file.
once it's done will Execute / Run Flink job on the files which is stored in HDFS.
bdalab@bdalabsys:/$ tar -xvzf flink-0.8-incubating-SNAPSHOT-bin-hadoop2.tgz
#rename the folder
bdalab@bdalabsys:/$ mv flink-0.8-incubating-SNAPSHOT/ flink-0.8
#move the working dir into flink_home
bdalab@bdalabsys:/$ cd flink-0.8
#start Flink on local mode
bdalab@bdalabsys:flink-0.8/$ ./bin/start-local.sh
#JobManager will started by above command. check the status by
bdalab@bdalabsys:flink-0.8/$ jps
6740 Jps
6725 JobManager
#JobManager web UI will started by default on port 8081
Now we have everything up & running. will try to Run job.
6725 JobManager
as we all are aware a familier WordCount example in distributed computing, lets begin with WordCount in Flink
#*-WordCount.jar file available under $FLINK_HOME/examples
bdalab@bdalabsys:flink-0.8/$ bin/flink run examples/flink-java-examples-0.8-incubating-SNAPSHOT-WordCount.jar /home/ipPath /home/flinkop
Above command, will run on file from local and store the result back to local file system.
#If we want to process the same in HDFS
bdalab@bdalabsys:flink-0.8/$ bin/flink run examples/flink-java-examples-0.8-incubating-SNAPSHOT-WordCount.jar hdfs://localhost:9000/ip/tvvote hdfs://localhost:9000/op/
make sure HDFS daemons are up&running . else will get an error.
#bin/flink has 4 major Action.
- run #runs a program
- info #displays information about a program.
- list #lists running and finished programs. -r & -s
- cancel #cancels a running program. -i
bdalab@bdalabsys:flink-0.8/$bin/flink list -r -s
In Next blog will explain you the Setup Flink on Cluster mode
1 comment:
I really better blog rf services
Post a Comment