Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

how to load data in hive automatically

Tags:

load

hadoop

hive

recently I want to load the log files into hive tables, I want a tool which can read data from a certain directory and load them into hive automatically. This directory may include lots of subdirectories, for example, the certain directory is '/log' and the subdirectories are '/log/20130115','/log/20130116','/log/201301017'. Is there some ETL tools which can achieve the function that:once the new data is stored in the certain directory, the tool can detect this data automatically and load them into hive table. Is there such tools, do I have to write script by myself?

like image 846
jacky zhang Avatar asked Jan 17 '13 06:01

jacky zhang


1 Answers

You can easily do this using Hive external tables and partitioning your table by day. For example, create your table as such:

create external table mytable(...) 
partitioned by (day string) 
location '/user/hive/warehouse/mytable';

This will essentially create an empty table in the metastore and make it point to /user/hive/warehouse/mytable.

Then you can load your data in this directory with the format key=value where key is your partition name (here "day") and value is the value of your partition. For example:

hadoop fs -put /log/20130115 /user/hive/warehouse/mytable/day=20130115

Once your data is loaded there, it is in the HDFS directory, but the Hive metastore doesn't know yet that it belongs to the table, so you can add it this way:

alter table mytable add partition(day='20130115');

And you should be good to go, the metastore will be updated with your new partition, and you can now query your table on this partition.

This should be trivial to script, you can create a cron job running once a day that will do these command in order and find the partition to load with the date command, for example continuously doing this command:

hadoop fs -test /log/`date +%Y%m%d`

and checking if $? is equal to 0 will tell you if the file is here and if it is, you can transfer it and add the partition as described above.

like image 196
Charles Menguy Avatar answered Sep 25 '22 12:09

Charles Menguy