Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Hive collect_list() does not collect NULL values

I am trying to collect a column with NULLs along with some values in that column...But collect_list ignores the NULLs and collects only the ones with values in it. Is there a way to retrieve the NULLs along with other values ?

SELECT col1, col2, collect_list(col3) as col3
FROM (SELECT * FROM table_1 ORDER BY col1, col2, col3)
GROUP BY col1, col2;

Actual col3 values

0.9
NULL
NULL
0.7
0.6 

Resulting col3 values

[0.9, 0.7, 0.6]

I was hoping that there is a hive solution that looks like this [0.9, NULL, NULL, 0.7, 0.6] after applying the collect_list.

like image 861
lalith kkvn Avatar asked Aug 12 '15 04:08

lalith kkvn


1 Answers

This function works like this, but I've found the following workaround. Add a case when statement to your query to check and keep NULLs.

SELECT col1, 
    col2, 
    collect_list(CASE WHEN col3 IS NULL THEN 'NULL' ELSE col3 END) as col3
FROM (SELECT * FROM table_1 ORDER BY col1, col2, col3)
GROUP BY col1, col2

Now, because you had a string element ('NULL') the whole result set is an array of strings. At the end just convert the array of strings to an array of double values.

like image 101
Zsuzsa Avatar answered Sep 20 '22 01:09

Zsuzsa