Tuesday, December 16, 2014

Configuring the Hive Metastore as MySQL or any RDBMS

Configuring the Hive Metastore as MySQL or any RDBMS

Hive will store the metadata information (i.e like RDBMS will stores the table
and column information) out of HDFS and it will process the data available in HDFS.

By default Hive store its metastore into Derby a lightweight database.
which will serve single instance at a time. If you try to start mutltiple instance of Hive, you will get error like
"Another instance of Derby may have already booted the database".

In this will see how we can configure other RDBMS (MySQL & PostgreSQL) as Hive metastore.

Create / rename hive-default.xml.template TO hive-site.xml under $HIVE_HOME/conf
hadoop@solai# vim.tiny $HIVE_HOME/conf/hive-default.xml

change the value of the following property
















download and plcae the "mysql-connector-java-5.x.xx-bin.jar" to the $HIVE_HOME/lib
hadoop@solai# mv /home/hadoop/Downloads/mysql-connector-java-5.1.31.tar.gz $HIVE_HOME/lib
In Mysql create database "hivedb" and load the hive schema to the database "hivedb"
mysql> create database hivedb;
mysql> use hivedb;

## following will create hive schema in mysql database.
mysql> $HIVE_HOME/scripts/metastore/upgrade/mysql/hive-schema-0.12.0.mysql.sql

its important to restrict user to alter / delete hivedb.
mysql> CREATE USER 'mysqlroot'@'hivedb' IDENTIFIED BY 'hive@123';

mysql> REVOKE ALL PRIVILEGES, GRANT OPTION FROM 'mysqlroot'@'hivedb';

mysql> GRANT SELECT,INSERT,UPDATE,DELETE,LOCK TABLES,EXECUTE ON hivedb.* TO 'mysqlroot'@'localhost';


mysql> quit;
Enter Hive CLI, for create table

hive> create table testHiveMysql(uname string, uplace string);
enter into mysql to check the schema information created in hive environment. following lines will return the table and column information.
mysql> select * from TBLS;
mysql> select * from COLUMNS_V2;
mysql> show tables;
show tables, will return all the tables pertaining to the Hive schema

Thursday, November 27, 2014

Upgrade Hadoop with Latest version - Simple steps

Upgrade Hadoop Namenode with Latest version - Simple steps

Here i've listed few simple steps to upgrade Hadoop NameNode with out loss of exsiting Data in the cluster.

It's advisable to take backup of Hadoop metadata placed under : dfs.namenode.name.data OR dfs.name.dir dir


1) stop-yarn.sh

2) stop-dfs.sh

3) Download and configure the latest version of Hadoop

4) cd $HADOOP_PREFIX/etc/hadoop
    in hdfs-site.xml ,
       change the dfs.namenode.name.dir and (if in case of pseudo node) dfs.datanode.data.dir to point to the old version of Hadoop path

5) ./sbin/hadoop-daemon.sh start namenode -upgrade

6) you will see following message in Web UI namenodeIP:50070 "Upgrade in progress. Not yet finalized." and SafeMode is ON

7) ./bin/hdfs dfsadmin -finalizeUpgrade

8) investigate the NameNode log, which should contains this information,
Upgradepgrade of local storage directories.
   old LV = -57; old CTime = 0.
   new LV = -57; new CTime = 1417064332016

9) safeMode will go off automatically, once you complete all these..

10) start the DFS
    ./sbin/start-dfs.sh --config $HADOOP_PREFIX/etc/hadoop

11) start the Yarn
    ./sbin/start-yarn.sh --config $HADOOP_PREFIX/etc/hadoop