Wednesday, April 30, 2014

Configuring Eclipse for Apache Hadoop 1.x/2.x : Debain/BOSS Operating System

Configure to integrate Eclipse IDE for Apache Hadoop on BOSS/Debian OS
Why Configure Eclipse for Apache Hadoop?

  1. From Eclipse to work on Hadoop files systems (HDFS). 
    1. Create New Directory
    2. Upload files to HDFS
    3. Upload Directory to HDFS
    4. Download from HDFS
  2. Write & Execute MapReduce program, which runs on Hadoop cluster
Step to Integrate Eclipse to work on Hadoop Cluster
  1. Hope you have Installed Eclipse, else download & install
  2. Run your Hadoop server, If haven't, setup Hadoop 
  3. Download hadoop-eclipse-plugin-1.2.1.jar and place the Jar into Eclipse/plugins (instead of downloading you can also build the plugin jar file yourself using "ant")
  4. Start the eclipse 
    1. $ECLIPSE_HOME/eclipse
  5. In Eclipse menu click,  Window --> Open Perspective --> Others -->  MapReduce
  6. In bottom MapReduce icon click to Add new Hadoop location
  7. Click to Add Hadoop location
  8. Enter MapReduce & HDFS running port
  9. Enter DFS port and MapReduce port
    1. for recall, MapReduce port (9001) specified in $HADOOP_HOME/conf/mapred-site.xml 
    2. for recall, HDFS port (9000) specified in $HADOOP_HOME/conf/core-site.xml
    3. Enter the Hadoop user name
       
  10. Once Hadoop location added, DFS Locations will be seen/displayed in Eclipse Project Explorer window, (Windows-->Show View-->Project Explorer)
  11. Once Hadoop added, DFS Locations will be seen/displayed in Project Explorer window,
    Display directory in HDFS
  12. Right click DFS location and click to Connect
  13. Once connected successfully, it will display all the DFS Folder.
  14. You can create Directory, Upload files to HDFS location, Download files to local by right click any of the listed Directory.  
HDFS File Management commands


Possible Error you may get

ERROR
              Error: Call to loaclhost/127.0.0.1:9000 failed on connection     exception:java.net:ConnectionException

SOLUTION
             Make sure you have all the Hadoop daemons up&running. 
            

7 comments:

Rajagopalan said...

i have tried in the same way as you described. But i got the connection exception error even if i had started all the daemons and it was running.

dataanalytics said...

hope you will come up with error, as we discussed via chat

Unknown said...

I have setup hadoop-2.4.0 and i get the following error

Server IPC version 9 cannot communicate with client version 4
inside eclipse when itry to connect.

can u help?

dataanalytics said...

it seems, you have not used proper plugin for Hadoop 2.X. I hope you have downloaded hadoop-eclipse-plugin-1.2.1.jar as per this post.

for Hadoop2.x, download JAR from here

https://github.com/winghc/hadoop2x-eclipse-plugin/archive/master.zip

DataRockz said...

Hi Solai,

Nice Post on Setting up the cluster.

A clarification, Master does not have started Datanode services and replication factor has been set to 2.
So two Datanodes are important.

1st on Master.
2nd on Slave.

dataanalytics said...

Ya.. u r right.. in this case, start datanode from datanode m/c.

hadoop-daemons.sh start datanode

Anonymous said...

Your comments are a refreshing blend of intellect and humility – a rare and appreciated combination. User-friendly Click Test – simplicity at its best.