My understanding of the following statement is that if blank or empty string is inserted into hive column, it will be treated as null.
TBLPROPERTIES('serialization.null.format'=''
To test the functionality i have created a table and insertted '' to the filed 3. When i query for nulls on the field3, there are no rows with that criteria.
Is my understanding of making blank string to null correct??
CREATE TABLE CDR
(
field1                 string,
field2                 string,
field3                 string
)
ROW FORMAT DELIMITED FIELDS TERMINATED BY ',' LINES TERMINATED BY '\n'
**TBLPROPERTIES('serialization.null.format'='');**
insert overwrite table emmtest.cdr select **field1,field2,''** from emmtest.cdr_non_orc;
select * from emmtest.cdr where **field3 is null;**
The last statement has not returned any rows. But i am expecting all rows to be returned since there is blank string in field3.
TBLPROPERTIES('serialization.null.format'='') means the following:
You are doing something else -
You are inserting an empty string to a table from a query.
It is treated "as is" - an empty string.
bash
hdfs dfs -mkdir /user/hive/warehouse/mytable
echo Hello,,World | hdfs dfs -put - /user/hive/warehouse/mytable/data.txt
hive
create table mytable (s1 string,s2 string,s3 string) 
row format delimited 
fields terminated by ','
;
hive> select * from mytable;
OK
s1  s2  s3
Hello       World
hive> alter table mytable set tblproperties ('serialization.null.format'='');
OK
hive> select * from mytable;
OK
s1  s2  s3
Hello   NULL    World
                        You can use the following in your Hive Query properties:
NULL DEFINED AS ''
or any character inside the quotes.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With