I'm a newbie with hadoop & hive. I want to delete certain rows in my database - which is on hive-hadoop. I know its not supported out of the box, and that hadoop is a read only file system. I'm curious about what are the best approaches for accomplishing this. If anyone has done this before, can they share their learnings/procedures?
Thanks!
In Big Data there really aren't deletes. That said, you can overwrite your table or partition if it isn't too big, or isolate your deletes to a particular partition like JamCon suggests.
For datasets which are not too huge, you can do something like
INSERT OVERWRITE TABLE mytable
SELECT * FROM mytable
WHERE ID NOT IN ( 'delete1', 'delete2', 'delete3');
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With