Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Combing the data of a table in hive

Need to combine the data in a hive table in one row. The intention is to capture the data/ value other than 'N' i.e. whatever value is present other than 'N' should be captured for all the 'col1' values

Table1:

col1 col2 col3 col4 col5 col6
-----------------------------
GHY   BG  Q    N    N    N
GHY   BG  N    T    N    N
GHY   BG  N    N    A    N
GHY   BG  N    N    N    Z

Tried with the following query:

Select col1, col2,array(
max(CASE WHEN col3 == 'Q' THEN 'Q' ELSE 'None' END),
max(CASE WHEN col4 == 'T' THEN 'T' ELSE 'None' END),
max(CASE WHEN col5 == 'A' THEN 'A' ELSE 'None' END),
max(CASE WHEN col6 == 'Z' THEN 'Z' ELSE 'None' END))
FROM table1 GROUP BY col1,col2;

and got the below:

Actual O/P:

GHY BG ['None','None','A','None']

Expected O/P:

GHY BG ['Q','T','A','Z']

Not getting the point of error :(

Update_1:

After removing 'max' from the query:

FAILED: SemanticException [Error 10025]: Line 2:11 Expression not in GROUP BY key 'Q'

Update_2:

select col1,col2,collect_set(col)
from (select col1,col2,t.col
      from tbl 
      lateral view explode(array(col3,col4,col5,col6)) t as col
      where t.col <> 'N'
     ) t

Error:

FAILED: SemanticException [Error 10025]: Line 1:7 Expression not in GROUP BY key 'col1'
like image 720
coolbreeze Avatar asked Dec 17 '22 18:12

coolbreeze


1 Answers

Use explode to get one row per column for a combination of col1,col2 and aggregate them with a collect_set.

select col1,col2,collect_set(col)
from (select col1,col2,t.col
      from tbl 
      lateral view explode(array(col3,col4,col5,col6)) t as col
      where t.col <> 'N'
     ) t
group by col1,col2
like image 148
Vamsi Prabhala Avatar answered Jan 04 '23 14:01

Vamsi Prabhala