I am hoping to run an import into Hive on a cron, and was hoping just using
"load data local inpath '/tmp/data/x' into table X" into a table would be sufficient.
Will subsequent commands overwrite whats already in the table? or will it append?
Hive provides multiple ways to add data to the tables. We can use DML(Data Manipulation Language) queries in Hive to import or add data to the table. One can also directly put the table into the hive with HDFS commands. In case we have data in Relational Databases like MySQL, ORACLE, IBM DB2, etc.
Description. The INSERT OVERWRITE DIRECTORY with Hive format overwrites the existing data in the directory with the new values using Hive SerDe . Hive support must be enabled to use this command. The inserted rows can be specified by value expressions or result from a query.
load data inpath command is use to load data into hive table. 'LOCAL' signifies that the input file is on the local file system. If 'LOCAL' is omitted then it looks for the file in HDFS. load data inpath '/directory-path/file.
This site http://wiki.apache.org/hadoop/Hive/LanguageManual is your friend when dealing with Hive. :)
The page that addresses loading data into Hive is http://wiki.apache.org/hadoop/Hive/LanguageManual/DML That page states that
if the OVERWRITE keyword is used then the contents of the target table (or partition) will be deleted and replaced with the files referred to by filepath. Otherwise the files referred by filepath will be added to the table. Note that if the target table (or partition) already has a file whose name collides with any of the filenames contained in filepath - then the existing file will be replaced with the new file.
In your case, you are not using the OVERWRITE
keyword, so the files will be added to the table. (Unless they are the same files, in which case they are overwritten)
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With