Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Handling null values on hive

Tags:

hadoop

hive

I have a column in hive with type double, but some of the rows are NULL when I do:

select columnA from table;

now, if I run the following, I get 0 for both queries:

select count(*) from table where columnA = "NULL";
select count(*) from table where columnA = NULL;

how can I count the rows in my table which are NULL?

like image 474
user1745713 Avatar asked Mar 04 '26 02:03

user1745713


2 Answers

The correct query is:

select count(*) from table where columnA is null;
like image 61
Olaf Avatar answered Mar 07 '26 11:03

Olaf


In Hive, count(*) counts all rows and count(columnA) will only count rows where columnA is non-NULL. If you would like to do multiple columns you could write the query as:

select count(*)-count(columnA), count(*)-count(columnB) from table;

and get the number of null values in each column. https://cwiki.apache.org/confluence/display/Hive/LanguageManual+UDF

like image 37
Aaron Avatar answered Mar 07 '26 09:03

Aaron



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!