Is there an easy way to dedupe a Hive table?

Question

I have a set of Hive tables on Elastic Map-Reduce which have some duplicate elements. Is there an easy way of deduping these tables?

What comes to mind is dumping to a set of pig-digestible files, firing up pig and using a DISTINCT query to regenerate the table. That seems like quite a bit of work, though, so I'm wondering if there's an easier way.

www · Accepted Answer

One query should remove duplicates:

INSERT OVERWRITE TABLE table
SELECT DISTINCT Col1, Col2 , ..., ColN FROM table

Is there an easy way to dedupe a Hive table?

Tags:

hive

apache-pig

elastic-map-reduce

rongenre

1 Answers

www

Recent Activity

Donate For Us

Is there an easy way to dedupe a Hive table?

Tags:

hive

apache-pig

elastic-map-reduce

rongenre

1 Answers

www

Related questions

Recent Activity

Donate For Us