Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

how to write case and group by in hive query

This is my hive table:

course   dept    subject   status

btech     cse     java     pass
btech     cse     hadoop   fail
btech     cse     cg       detained
btech     cse     cc       pass
btech      it     daa      pass
btech      it     wt       pass
btech      it     cnn      pass
mba        hr     hrlaw    pass
mba        hr     hrguid   absent
mtech      cs     java     pass
mtech      cs     cd       pass
mtech      cs     cp       detained

I want to query this table to retrieve data in the following way:

course   dept    status

btech     cse     fail
btech      it     pass
mba        hr     absent
mtech      cs     fail

First, it will check for "fail" or "detained" in the status of each dept and course grouped together. If it finds "fail" or "detained", it will output "fail" as the status. Else, if an "absent" is found in the same group, it will output "absent" as the status. Else, it will output "pass".

I got an error message when I ran the following query:

select course,dept,
case 
when status in ( 'fail','detained') then 'fail'
when status in ( 'absent') then 'absent'
when status in ( 'pass') then 'pass'
else null 
end as Final_Status
from college
group by course,dept;
like image 827
priyanka Avatar asked May 25 '16 14:05

priyanka


3 Answers

When you group by course and dept, you would get multiple values (comming for different records) for status column, this needs to be handled.
Any column in the select which is not part of group by should be within an aggregate function
here is a solution using sum() function.

select course, dept,
    case when sum(case when status in ( 'fail','detained') then 1 else 0 end) > 0 then 'fail'
         when sum(case when status in ('absent') then 1 else 0 end) > 0 then 'absent'
         when sum(case when status in ('pass') then 1 else 0 end) > 0 then 'pass'
         else 'no_result'
    end as final_status
from college
group by 
    course,dept
like image 111
b1n0ys Avatar answered Nov 14 '22 03:11

b1n0ys


If I understood correctly, you want something like:

select course,dept,
case 
when status in ( 'fail','detained') then 'FAILED'
when status in ( 'absent') then 'absent'
when status in ( 'pass') then 'PASSED'
else null 
end as Final_Status
from college
group by course,dept, 
   CASE when status in ( 'fail','detained') then 'FAILED'
   when status in ( 'absent') then 'absent'
   when status in ( 'pass') then 'PASSED'
   else null END;

I am using CASE in GROUP and it works fine with Hive.

like image 26
Thomas Decaux Avatar answered Nov 14 '22 03:11

Thomas Decaux


Try this.

select course,dept,
collect_set(
case 
when status in ( 'fail','detained') then 'FAILED'
when status in ( 'absent') then 'absent'
when status in ( 'pass') then 'PASSED'
else null 
end ) as Final_Status
from college
group by course,dept;
like image 44
Lokesh Avatar answered Nov 14 '22 02:11

Lokesh