Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Empty String is not treated as null in Hive

My understanding of the following statement is that if blank or empty string is inserted into hive column, it will be treated as null.

TBLPROPERTIES('serialization.null.format'=''

To test the functionality i have created a table and insertted '' to the filed 3. When i query for nulls on the field3, there are no rows with that criteria.

Is my understanding of making blank string to null correct??

CREATE TABLE CDR
(
field1                 string,
field2                 string,
field3                 string
)
ROW FORMAT DELIMITED FIELDS TERMINATED BY ',' LINES TERMINATED BY '\n'
**TBLPROPERTIES('serialization.null.format'='');**

insert overwrite table emmtest.cdr select **field1,field2,''** from emmtest.cdr_non_orc;

select * from emmtest.cdr where **field3 is null;**

The last statement has not returned any rows. But i am expecting all rows to be returned since there is blank string in field3.

like image 958
AKC Avatar asked Dec 24 '22 19:12

AKC


2 Answers

TBLPROPERTIES('serialization.null.format'='') means the following:

  • An empty field in the data files will be treated as NULL when you query the table
  • When inserting rows to the table, NULL values will be written to the data files as empty fields

You are doing something else -
You are inserting an empty string to a table from a query.
It is treated "as is" - an empty string.

Demo

bash

hdfs dfs -mkdir /user/hive/warehouse/mytable
echo Hello,,World | hdfs dfs -put - /user/hive/warehouse/mytable/data.txt

hive

create table mytable (s1 string,s2 string,s3 string) 
row format delimited 
fields terminated by ','
;

hive> select * from mytable;
OK
s1  s2  s3
Hello       World

hive> alter table mytable set tblproperties ('serialization.null.format'='');
OK

hive> select * from mytable;
OK
s1  s2  s3
Hello   NULL    World
like image 199
David דודו Markovitz Avatar answered Dec 26 '22 08:12

David דודו Markovitz


You can use the following in your Hive Query properties:

NULL DEFINED AS ''

or any character inside the quotes.

like image 41
John R. Martinez Avatar answered Dec 26 '22 09:12

John R. Martinez