lets say, I created Hive external table "myTable" from file myFile.csv ( located in HDFS ).
myFile.csv is changed every day, then I'm interested to update "myTable" once a day too.
Is there any HiveQL query that tells to update the table every day?
Thank you.
P.S.
I would like to know if it works the same way with directories: lets say, I create Hive partition from HDFS directory "myDir", when "myDir" contains 10 files. next day "myDIr" contains 20 files (10 files were added). Should I update Hive partition?
There are two types of tables in Hive basically.
One is Managed table managed by hive warehouse whenever you create a table data will be copied to internal warehouse.
You can not have latest data in the query output
.
Other is external table in which hive will not copy its data to internal warehouse
.
So whenever you fire query on table then it retrieves data from the file.
SO you can even have the latest data in the query output.
That is one of the goals of external table.
You can even drop the table and the data is not lost.
If you add a LOCATION '/path/to/myFile.csv'
clause to your table create statement, you shouldn't have to update anything in Hive. It will always use the latest version of the file in queries.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With