I have a column in hive with type double, but some of the rows are NULL when I do:
select columnA from table;
now, if I run the following, I get 0 for both queries:
select count(*) from table where columnA = "NULL";
select count(*) from table where columnA = NULL;
how can I count the rows in my table which are NULL?
The correct query is:
select count(*) from table where columnA is null;
In Hive, count(*) counts all rows and count(columnA) will only count rows where columnA is non-NULL. If you would like to do multiple columns you could write the query as:
select count(*)-count(columnA), count(*)-count(columnB) from table;
and get the number of null values in each column. https://cwiki.apache.org/confluence/display/Hive/LanguageManual+UDF
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With