How do I do an EXCEPT
clause (like SQL) in Hive QL
I have 2 tables, and each table is a column of unique ids.
I want to find the list of ids that are only in table 1 but not in table 2
Table 1
apple
orange
pear
Table 2
apple
orange
In SQL you can do an EXCEPT clause (http://en.wikipedia.org/wiki/Set_operations_%28SQL%29) but you can't do that in Hive QL
We can use NOT EXISTS clause in Hive as MINUS equivalent.
SELECT t1.id FROM t1 WHERE NOT EXISTS (SELECT 1 from t2 WHERE t2.id = t1.id);
I don't think there's any built-in way to do this but a LEFT OUTER JOIN
should do the trick.
This selects all Ids from table1
that do not exist in table2
:
SELECT t1.id FROM table1 t1 LEFT OUTER JOIN table2 t2 ON (t1.id=t2.id) WHERE t2.id IS NULL;
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With