Thursday, April 23, 2015

return code 2 from org.apache.hadoop.hive.ql.exec.mr.MapRedTask

Execution Error, return code 2 from org.apache.hadoop.hive.ql.exec.mr.MapRedTask

I have created partiton table in Hive, When I do Insert into partition table I was stuck of with these error..

Error

    Execution Error, return code 2 from org.apache.hadoop.hive.ql.exec.mr.MapRedTask

....
[Fatal Error] Operator FS_2 (id=2): Number of dynamic partitions exceeded hive.exec.max.dynamic.partitions.pernode.

above error in first line was little confusing, if you scroll up the console you may get the second line which wss the real cause of the exception.

Solution

  bdalab@solai:/opt$ hive

  hive> set hive.exec.max.dynamic.partitions.pernode=500

by default hive.exec.max.dynamic.partitions.pernode set to 100, if the partition will exceeds the limit, you will get an error. Just change the default value based on the requirement to rid out of these.

Wednesday, April 22, 2015

how to : Execute HDFS commands from DataNode

how to : Work on NameNode (HDFS) from DataNode

command work on NameNode from DataNode or any other Hadoop installed system(which may/may not be part of hadoop cluster)

HDFS 'FS' command execute from DataNode

    bdalab@solai:/opt$ hadoop fs -fs hdfs://masterNodeIP:9000/ -rm /input/log.csv


Above command will be executed from DataNode, File '/input/log.csv' will be removed from NameNode.
here, masterNodeIP -> IP address of remote system

List/show all the files in NameNode from DataNode

    bdalab@solai:/opt$ hadoop fs -fs hdfs://masterNodeIP:9000/ -ls /


Create dir 'pjt' in NameNode from DataNode

    bdalab@solai:/opt$ hadoop fs -fs hdfs://masterNodeIP:9000/ -mkdir /pjt

all the above command will be run from DataNode and executed on NameNode

Tuesday, April 21, 2015

How to : Move the file from DataNode to NameNode

1 step to Move the file from DataNode to NameNode


solai In some case we would need to move the file which is available in DataNode but not in HDFS to HDFS cluster.

Same command also usefull when you have a file in any Hadoop installed system (which may/may not be part of Hadoop cluster) to any NameNode.

move file from DataNode system to Hadoop cluster

    bdalab@solai:/opt$ hadoop fs -fs NameNodeIP:9000/ -put /FileToMove /hadoopFolderName/


here,
NameNodeIP -> IP address of NameNode system
FileToMove -> is a file to be moved to HDFS

OR
    bdalab@solai:/opt$ hadoop fs -fs hdfs://10.0.18.269:9000/ -put /FileToMove /hadoopFolderName/

Friday, April 17, 2015

1 step to Move the file from Hadoop HDFS to remote system

1 step to Move the file from Hadoop HDFS to remote system

command to move the file from Hadoop HDFS cluster to remote system.

In some case we would need to move the output of MapReduce file from
Hadoop HDFS to non installed Hadoop system.

move file from Hadoop cluster to remote system ( Non Hadoop system )

    bdalab@solai:/opt$ hadoop dfs -cat hdfs://NameNodeIP:9000/user/part-* | ssh userName@remoteSystemIP 'cat - > /home/hadoop/MRop'


here,
part-* -> is a file to be moved from HDFS
userName -> userName of remote system
remoteSystemIP -> IP address of remote system

1 step to Move the file from remote system to Hadoop HDFS

1 step to Move the file from remote system to Hadoop HDFS
1 command to move the file from remote system to Hadoop HDFS cluster.

In some case we would need to move the file from non installed
Hadoop system to Hadoop cluster.

move file from Non Hadoop system to Hadoop cluster

    bdalab@solai:/opt$ cat moveToHdfs.txt | ssh userName@NameNodeIP "hadoop dfs -put - hadoopDirName/"


here,
moveToHdfs.txt -> is a file to be moved to HDFS
userName -> userName of NameNode system
NameNodeIP -> IP address of NameNode
hadoopDirName -> Dir in HDFS


If you face any error like

    bash: hadoop: command not found
re-run the above command by hadoop full path

    bdalab@solai:/opt$ cat moveToHdfs.txt | ssh userName@NameNodeIP "/opt/hadoop-2.6.0/bin/hadoop dfs -put - hadoopDirName/"

Saturday, April 11, 2015

4 best tools for Big Data visualization

4 best tools for Big Data analytics and visualization
Vizuvalization will play the major role in big data data analytics. Human role in
visuvalization are limited to
      Identify the visual patterns and anomalies
      Seeing pattern across groups.
Once you have done big data analytics using your favourite tools (Hadoop,
Spark or Machine learning), next to impress customer by dashbord/graphcs in order to making better business decision.


Big Data visualization tools From Apache

zeppelin

kylin

note : both are now part of apache incubator project


Other interesting tools



I had experience with gephi. Now working with apache incubator Zeppelin and Kylin. Will update further with working model.

Tuesday, April 7, 2015

Configure UBER mode - MapReduce job for small dataset

Uber job configuration in YARN - Hadoop2
previous post - what is Uber mode

How to configure uber job.?

    To enable uber jobs, need to set the following property in yarn-site.xml.
    mapreduce.job.ubertask.enable=true
    mapreduce.job.ubertask.maxmaps=9 (default 9)
    mapreduce.job.ubertask.maxreduces=0 (default 1)
    mapreduce.job.ubertask.maxbytes=4096

mapreduce.job.ubertask.maxbytes

    above value for 4MB, default value = bloksize. The total input size of a job must be less then or equal to this value for the job to be uberized.
   Ex. say if you have data set which is 5MB of size, but you have set 4MB for mapreduce.job.ubertask.maxbytes, then uber mode will not set.
    If you omit this, by default bloksize value assigned (12MB). If you are going to run a dataset of size 50MB will not set in uber mode.)

UBER mode in YARN Hadoop2 - Running MapReduce jobs in small dataset

what is Uber mode in YARN - Hadoop2

You might have seen these lines while running MapReduce in Hadoop2. mapreduce.Job: Job job_1387204213494_0005 running in uber mode : false

what is UBER mode in Hadoop2?

    In normally mappers and reducers will run by ResourceManager (RM), RM will create separate container for mapper and reducer.
    uber configuration, will allow to run mapper and reducers in the same process as the ApplicationMaster (AM).

Uber jobs :

    Uber jobs are jobs that are executed within the MapReduce ApplicationMaster. Rather then communicate with RM to create the mapper and reducer containers.
    The AM runs the map and reduce tasks within its own process and avoided the overhead of launching and communicate with remote containers.
Why

    If you have a small no dataset, want to run MapReduce on small amount of data. Uber configuration will help you out, by reducing additional time that MapReduce normally spends mapper and reducers phase.
Can I configure/have a Uber for all MapReduce job.?

    As of now,
       map-only jobs
       jobs with one reducer are supported.

Saturday, April 4, 2015

Easy way to recover the deleted files/dir in Hadoop hdfs

Easy way to recover the deleted files/dir in hdfs
In some cause, accidently files or dir will be deleted,
Is there any way to recover to get back.??

   By default Hadoop will delete the files/dir forever. It has Trash feature, which is not enabled by default.

   By configuring #fs.trash.interval and #fs.trash.checkpoint.interval in Hadoop core-site.xml will move the deleted files/dir into .Trash folder.

   location of .Trash folder is in HDFS /user/$USER/.Trash

configuring core-site.xml

<property>
<name>fs.trash.interval</name>
<value>120</value> 
</property>

<property>
<name>fs.trash.checkpoint.interval</name>
<value>45</value>
</property>

   In above configuration, all the deleted files / dir will me moved to .Trash folder and keep the data for two hours.
   checkpoint intervel check will performed for every 45 min and deletes all the file/dir which more then 2 hours old from .Trash folder.

restart the hadoop
    once you modify the core-site.xml , stop and start the hadoop

   here is the example of remove dir command
hadoop@solai# hadoop fs -rmr /testTrash
15/04/05 01:10:14 INFO fs.TrashPolicyDefault: Namenode trash configuration: Deletion interval = 120 minutes, Emptier interval = 45 minutes. Moved: 'hdfs://127.0.0.1:9000/testTrash' to trash at: hdfs://127.0.0.1:9000/user/bdalab/.Trash/Current
   you can clearly get the message say that deletd folder will be moved to /user/bdalab/.Trash/Current and will keep the data for 2 hours with check point interval 45 min.

list the deleted files/dir in .Trash folder using -ls
hadoop@solai# hadoop fs -ls hdfs://127.0.0.1:9000/user/bdalab/.Trash/Current/testTrash
you can view the content (by -cat) or move the files to original path.
hadoop@solai# hadoop fs -mv hdfs://127.0.0.1:9000/user/bdalab/.Trash/Current/testTrash /testTrash

3 simple steps to resolve linux read-only file system to read-write - ubuntu

3 simple steps to resolve linux read-only file system to read-write - ubuntu
My internal partition which has read write permission earlier,
turned back to readonly mode.

issue

when I tried to change it into full permission by
root@boss[bin]# sudo chmod 777 -R /media/bdtools
, I got chmod changing permissions of read-only file system
Then i tried to re-mount the partition, as many people were discussed over web.
root@boss[bin]# mount -o remount,rw /dev/sda9 /media/bdalab/bdtools
, I got filesystem drive is now write-protected
Solution
I have gone through the dmesg log, have seen error like
ext4_put_super:792: Couldn't clean up the journal

I followd below approach to overcome the readonly filesystem issue.
1) un mount the partition
2) fsck /dev/sda9
3) remount the partition


Note : before running fsck, advised to get an idea

posts you may like

solution for Hadoop High Availability
Distributed Hadoop cluster setup
Issue while setup Hadoop cluster