Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

how to write subquery and use "In" Clause in Hive

Tags:

hadoop

hive

How can I use In clause in Hive I want to write something like this in Hive select x from y where y.z in (select distinct z from y) order by x; But I am not finding any way of doing it.. I tried In clause in Hive 0.7 it was throwing error, Also I tried Find_in_Set .. using find_in_set(y.z,subquery).. but the job is getting failed.

I want to do this on Hive. Please help me if anybody knows how to do this in Hive..

Thanks & Regards, Atul

like image 449
user722856 Avatar asked Oct 06 '11 16:10

user722856


People also ask

Can we use subquery in Hive?

Hive supports subqueries only in the FROM clause (through Hive 0.12). The subquery has to be given a name because every table in a FROM clause must have a name. Columns in the subquery select list must have unique names.

Can subquery be used in FROM clause?

The FROM clause is the only context in which a subquery can specify the ORDER BY clause.

How do I use multiple subqueries in Hive?

Multiple subqueries allowed in hive. I tested with below code,it works. select * from (select id from test where id>10) a join (select id from test where id>20) b on a.id=b.id; Please post your exact code so that I can give relevant solution.


2 Answers

You can use semi join(https://cwiki.apache.org/Hive/languagemanual-joins.html):

LEFT SEMI JOIN implements the correlated IN/EXISTS subquery semantics in an efficient way. Since Hive currently does not support IN/EXISTS subqueries, you can rewrite your queries using LEFT SEMI JOIN. The restrictions of using LEFT SEMI JOIN is that the right-hand-side table should only be referenced in the join condition (ON-clause), but not in WHERE- or SELECT-clauses etc.

SELECT a.key, a.value
  FROM a
  WHERE a.key in
   (SELECT b.key
    FROM B);

can be rewritten to:

   SELECT a.key, a.val
   FROM a LEFT SEMI JOIN b on (a.key = b.key)
like image 118
www Avatar answered Sep 22 '22 05:09

www


Hive 0.13 now do support IN/EXISTS in the WHERE-clause .. The issue https://issues.apache.org/jira/browse/HIVE-784 has been resolved after 4 years :)

like image 45
Gaurav Kohli Avatar answered Sep 19 '22 05:09

Gaurav Kohli