I'm new to Hive and I wanted to know if insert overwrite will overwrite an existing table I have created. I want to filter an already created table, let's call it TableA, to only select the rows where age is greater than 18. Can I achieve this using insert overwrite table?
I'm thinking of writing something like:
INSERT OVERWRITE TABLE TableA SELECT a.Age FROM TableA WHERE a.Age > = 18
there are NA entries in the table I created, but I assume that after I filter this table there will be no NAs in the Age column, right?
Description. The INSERT OVERWRITE DIRECTORY with Hive format overwrites the existing data in the directory with the new values using Hive SerDe . Hive support must be enabled to use this command. The inserted rows can be specified by value expressions or result from a query.
The INSERT OVERWRITE statement overwrites the existing data in the table using the new values. The inserted rows can be specified by value expressions or result from a query.
Conclusion. In summary the difference between Hive INSERT INTO vs INSERT OVERWRITE, INSERT INTO is used to append the data into Hive tables and partitioned tables and INSERT OVERWRITE is used to remove the existing data from the table and insert the new data.
Self filtering and insertion is not support , yet in hive.
I would suggest the following steps in your case :
1.Create a similar table , say tabB , with same structure.
create table tabB like tableA;
2.Then you could apply your filter and insert into this new table.
INSERT OVERWRITE TABLE tabB SELECT a.Age FROM TableA WHERE a.Age > = 18
Hope this helps.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With