Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

updating Hive external table with HDFS changes

lets say, I created Hive external table "myTable" from file myFile.csv ( located in HDFS ).

myFile.csv is changed every day, then I'm interested to update "myTable" once a day too.

Is there any HiveQL query that tells to update the table every day?

Thank you.

P.S.

I would like to know if it works the same way with directories: lets say, I create Hive partition from HDFS directory "myDir", when "myDir" contains 10 files. next day "myDIr" contains 20 files (10 files were added). Should I update Hive partition?

like image 690
sunny Avatar asked Jun 10 '13 15:06

sunny


2 Answers

There are two types of tables in Hive basically.

One is Managed table managed by hive warehouse whenever you create a table data will be copied to internal warehouse. You can not have latest data in the query output.

Other is external table in which hive will not copy its data to internal warehouse.

So whenever you fire query on table then it retrieves data from the file.

SO you can even have the latest data in the query output.

That is one of the goals of external table.

You can even drop the table and the data is not lost.

like image 80
Balaswamy Vaddeman Avatar answered Sep 22 '22 12:09

Balaswamy Vaddeman


If you add a LOCATION '/path/to/myFile.csv' clause to your table create statement, you shouldn't have to update anything in Hive. It will always use the latest version of the file in queries.

like image 20
Mike Park Avatar answered Sep 23 '22 12:09

Mike Park