I have two tables,tab1 & tab2.
tab1(T1) tab2(T2)
a1 b1
b1 c1
c1 f1
d1 g1
I am looking for the values from table T1 that are not present in T2.
In this case, the output should be a1 d1
I have tried with the following query but couldn't get the right solution.
select distinct tab1.T1 from tab1 left semi join tab2 on (tab1.T1!=tab2.T2);
Though the IN operator in hive works just fine in this case as well but NOT IN doesn't. If I replace NOT IN with IN operator, that works. Inact using NOT IN with a list of strings specified works too but it somehow does not work with select statement.
When using NOT IN you should add is not null to the inner query, unless you are 100% sure that the relevant column does not contain null values. One null value is enough to cause your query to return no results.
The easiest way to select specific columns in the Hive query is by specifying the column name in the select statement. SELECT col1, col3, col4 .... FROM Table1; But imagine your table contains many columns (i.e : more than 100 columns) and you need to only exclude a few columns in the select statement.
Hive supports subqueries only in the FROM clause (through Hive 0.12). The subquery has to be given a name because every table in a FROM clause must have a name. Columns in the subquery select list must have unique names.
SELECT t1.str
FROM tab1 t1
LEFT OUTER JOIN tab2 t2 ON t1.str = t2.str
WHERE t2.str IS NULL;
Result:
OK
a1
d1
"Why is the t2.str is null condition there": Left outer joins ensure that all values from the first table are included in the results. So then what happens when there are no values in the second table: in that case all of the columns from the second table are reported as null.
So in the case above we are searching precisely for the condition that the second table entries are missing - and thus we:
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With