Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to delete rows in hive hadoop database

I'm a newbie with hadoop & hive. I want to delete certain rows in my database - which is on hive-hadoop. I know its not supported out of the box, and that hadoop is a read only file system. I'm curious about what are the best approaches for accomplishing this. If anyone has done this before, can they share their learnings/procedures?

Thanks!

like image 845
Sunny Avatar asked Nov 29 '22 11:11

Sunny


1 Answers

In Big Data there really aren't deletes. That said, you can overwrite your table or partition if it isn't too big, or isolate your deletes to a particular partition like JamCon suggests.

For datasets which are not too huge, you can do something like

INSERT OVERWRITE TABLE mytable
SELECT * FROM mytable
WHERE ID NOT IN ( 'delete1', 'delete2', 'delete3');
like image 64
Jerome Banks Avatar answered Dec 06 '22 16:12

Jerome Banks