Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Hive LEFT SEMI JOIN for 'NOT EXISTS'

I have two tables with a single key column. Keys in table a are subset of all keys in table b. I need to select keys from table b that are NOT in table a.

Here is a citation from Hive manual: "LEFT SEMI JOIN implements the uncorrelated IN/EXISTS subquery semantics in an efficient way. As of Hive 0.13 the IN/NOT IN/EXISTS/NOT EXISTS operators are supported using subqueries so most of these JOINs don't have to be performed manually anymore. The restrictions of using LEFT SEMI JOIN is that the right-hand-side table should only be referenced in the join condition (ON-clause), but not in WHERE- or SELECT-clauses etc."

They use this example for illustration:

    SELECT a.key, a.value FROM a WHERE a.key IN (SELECT b.key FROM B);

Is equivalent to

    SELECT a.key, a.val FROM a LEFT SEMI JOIN b ON (a.key = b.key);

However, what I need to do is first example with 'NOT IN;. Unfortunately this syntax is not supported in Hive 0.13. It's for illustration only:

    SELECT a.key, a.value FROM a WHERE a.key NOT IN (SELECT b.key FROM B);

I searched this site for recommendations, and saw this example:

    SELECT a.key FROM a LEFT OUTER JOIN b ON a.key = b.key WHERE b.key IS NULL;

It does not work as expected. When I join a.key NOT in b and a.key IN b, I don't get the original a this way. Maybe that is because this query cannot do the trick, note bold text - b.key should not appear in WHERE.

What should I do then? Any other trick? Thanks!

P.S. I cannot share any real data; it's a pretty simple example, where keys in a are all included in b and a is a subset of b.

like image 291
mel Avatar asked Jul 30 '14 15:07

mel


1 Answers

Or you can try

SELECT a.key FROM a LEFT ANTI JOIN b ON a.key = b.key
like image 73
user3512680 Avatar answered Sep 28 '22 04:09

user3512680