Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Handling NULL values in Hive

Tags:

null

hive

I am trying to create a table (table 2) in Hive from another table (table 1). I am trying to exclude certain rows with NULL values and tried the following condition.

    insert overwrite table table2 partition (date = '2013-06-01')
    select column1, column 2....
    from table1
    where column1 is not NULL or column1 <> '';

However, when I try this following query with the new table I get 300+ rows with NULL vaues:

    select count(*) from table2 where column1 is NULL;

Could someone point to what is causing these NULL values?

Thank you.

Ravi

like image 567
Ravi Avatar asked Aug 25 '13 19:08

Ravi


People also ask

How do I cast a NULL in hive?

Hive CAST String to Integer Data Types When you casting String value within the ranges of each data type, you will get numeric as an output. when a value is out of range you will get NULL . In the last example, 128 is out of range for TINYINT hence it returns NULL value when you cast.

How do you check if a value is NULL in hive?

2.1 isnull( a ) This returns a true when the value of a (column) is NULL otherwise it returns false. Above example column _C1 is derived based on salary column, if you notice isnull() function return true for value NULL and false for non NULL values.

Does NVL work in hive?

The hive nvl function is one of the same functions. We can use the nvl function as the keyword in the hive query. It will update, we need to replace the null value in the table with the specific value. With the help of the nvl keyword, we can easily replace the null values from the hive table.

Does Hive support not NULL?

So to sum up you wont be ablecreate Not Null constraints hive table and enforce it by design .


2 Answers

Firstly — I don't think column1 is not NULL or column1 <> '' makes very much sense. Maybe you meant to write column1 is not NULL and column1 <> '' (AND instead of OR)?

Secondly — because of Hive's "schema on read" approach to table definitions, invalid values will be converted to NULL when you read from them. So, for example, if table1.column1 is of type STRING and table2.column1 is of type INT, then I don't think that table1.column1 IS NOT NULL is enough to guarantee that table2.column1 IS NOT NULL. (I'm not sure about this, though.)

like image 126
ruakh Avatar answered Oct 13 '22 10:10

ruakh


Try to include length > 0 as well.

column1 is not NULL AND column1 <> '' AND length(column1) > 0 
like image 34
ShikharDua Avatar answered Oct 13 '22 10:10

ShikharDua