Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to use NOT IN in Hive

Suppose I have 2 tables as shown below. Now, if I want to achieve result which sql will give using, insert into B where id not in(select id from A) which will insert 3 George in Table B.

How to implement this in hive?

Table A

id  name      
1   Rahul     
2   Keshav    
3   George

Table B

id  name      
1   Rahul     
2   Keshav    
4   Yogesh   
like image 699
user8167344 Avatar asked Jun 23 '17 06:06

user8167344


People also ask

Does Hive support not in?

Unfortunately, Hive doesn't support in, exists or subqueries.

How do you find not equal to in Hive?

Hive Relational Operators Returns TRUE when A is equal to B, FLASE when they are not equal. Similar to = operator. Same as = and == operator for non-null values. Returns TRUE if A is not equal to B, otherwise FALSE.

How do I exclude columns in Hive?

The easiest way to select specific columns in the Hive query is by specifying the column name in the select statement. SELECT col1, col3, col4 .... FROM Table1; But imagine your table contains many columns (i.e : more than 100 columns) and you need to only exclude a few columns in the select statement.

What Hive is not?

What Hive Is NOT. Hive is not designed for online transaction processing. It is best used for traditional data warehousing tasks.


1 Answers

NOT IN in the WHERE clause with uncorrelated subqueries is supported since Hive 0.13 which was released more than 3 years ago, on 21 April, 2014.

select * from A where id not in (select id from B where id is not null);

+----+--------+
| id |  name  |
+----+--------+
|  3 | George |
+----+--------+

On earlier versions the column of the outer table should be qualified with the table name/alias.

hive> select * from A where id not in (select id from B where id is not null);
FAILED: SemanticException [Error 10249]: Line 1:22 Unsupported SubQuery Expression 'id': Correlating expression cannot contain unqualified column references.

hive> select * from A where A.id not in (select id from B where id is not null);
OK
3   George

P.s.
When using NOT IN you should add is not null to the inner query, unless you are 100% sure that the relevant column does not contain null values.
One null value is enough to cause your query to return no results.

like image 163
David דודו Markovitz Avatar answered Oct 09 '22 02:10

David דודו Markovitz