When I do a select statement for non-null values on a hive table, there are no correct results in the response. The result is as if "is not null" expression is not there!
Example:
select count(*)
from test_table
where test_col_id1='12345' and test_col_id2 is not null;
Note test_col_id1
and test_col_id2
are not partition keys.
Here's my hive version.
Hive 0.14.0.2.2.0.0-2041
Here's the table:
... | test_col_id1 | test_col_id2 |
... | 12345 | x |
... | 12345 | NULL |
This query returns 2 records.
Try the following query, does it return rows?
select count(*)
from test_table
where test_col_id1='12345' and test_col_id2 != 'NULL';
Then your NULL
is not NULL
, it's the string 'NULL'. Loads of people have problems with Hive treatment of NULL
strings. By default, it's the blank string ''
. If we want anything else, we have to specify exactly the way NULL strings should be treated when we create the table. Here are 3 examples of how to change what is recognized as NULL
. The first one sets 'NULL' strings as NULL
:
CREATE TABLE nulltest1 (id STRING, another_string STRING)
TBLPROPERTIES('serialization.null.format'='NULL') --sets the string 'NULL' as NULL;
CREATE TABLE nulltest2 (id STRING, another_string STRING)
TBLPROPERTIES('serialization.null.format'='') --sets empty string as NULL;
CREATE TABLE nulltest3 (id STRING, another_string STRING)
TBLPROPERTIES('serialization.null.format'='\N'); --sets \N as NULL;
Since you've already created your table, you can alter your table so that it will recognize your 'NULL'
as NULL
:
ALTER TABLE test_table SET TBLPROPERTIES ('serialization.null.format' = 'NULL');
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With