Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Nested queries in Hive SQL

Tags:

sql

hive

I have a database, and I use a query to produce an intermediate table like this:

id    a      b    
xx    1      2    
yy    7      11    

and I would like to calculate the standard deviations of b for the users who have a < avg(a)

I calculate avg(a) that way and it works fine:

select avg(select a from (query to produce intermediate table)) from table;

But the query:

select stddev_pop(b) 
from (query to produce intermediate table) 
where a < (select avg(select a 
                     from (query to produce intermediate table))
          from table);

Returns me an error, and more precisely, I am told that the "a" from avg(select a from...) is not recognised. This makes me really confused, as it works in the previous query.

I would be grateful if somebody could help.

EDIT:

I stored the result of my query to generate the intermediary table into a temporary table, but still run into the same problem. The non working query becomes:

select stddev_pop(b) from temp where a < (select avg(a) from temp);

while this works:

select avg(a) from temp;
like image 258
S4M Avatar asked Nov 29 '12 17:11

S4M


1 Answers

OK, a colleague helped me to do it. I'll post the answer in case someone runs into the same problem:

select stddev_pop(b)
from temp x
join (select avg(a) as average from temp) y
where x.a < y.average;

Basically hive doesn't do caching of a table as a variable.

like image 115
S4M Avatar answered Oct 25 '22 11:10

S4M