I am getting the null values while loading the data from flat files into hive tables.
my tables structure is like this:
hive> create table test_hive (id int,value string);
and my flat file is like this:
input.txt
1 a
2 b
3 c
4 d
5 e
6 F
7 G
8 j
when I am running the below commands I am getting null values:
hive> LOAD DATA LOCAL INPATH '/home/hduser/input.txt' OVERWRITE INTO TABLE test_hive;
hive> select * from test_hive;
OK<br>
NULL NULL
NULL NULL
NULL NULL
NULL NULL
NULL NULL
NULL NULL
NULL NULL
NULL NULL
screen shot:
hive> create table test_hive (id int,value string);
OK
Time taken: 4.97 seconds
hive> show tables;
OK
test_hive
Time taken: 0.124 seconds
hive> LOAD DATA LOCAL INPATH '/home/hduser/input2.txt' OVERWRITE INTO TABLE test_hive;
Copying data from file:/home/hduser/input2.txt
Copying file: file:/home/hduser/input2.txt
Loading data to table default.test_hive
Deleted hdfs://hydhtc227141d:54310/app/hive/warehouse/test_hive
OK
Time taken: 0.572 seconds
hive> select * from test_hive;
OK
NULL NULL
NULL NULL
NULL NULL
NULL NULL
NULL NULL
NULL NULL
NULL NULL
NULL NULL
Time taken: 0.182 seconds
Use nvl() function in Hive to replace all NULL values of a column with a default value, In this article, I will explain with an example. Replace all NULL values with -1 or 0 or any number for the integer column. Replace all NULL values with empty space for string types. Replace with any value based on your need.
The HIVE_DEFAULT_PARTITION in hive is represented by a NULL value of the partitioned column. That means, if we have a NULL value for a partition column and loading this record to a partitioned table, then hive_default_partition will get create for that record.
The hive nvl function is one of the same functions. We can use the nvl function as the keyword in the hive query. It will update, we need to replace the null value in the table with the specific value. With the help of the nvl keyword, we can easily replace the null values from the hive table.
The default field terminator in Hive is ^A. You need to explicitly mention in your create table statement that you are using a different field separator.
Similar to what Lorand Bending pointed in the comment, use:
CREATE TABLE test_hive(id INT, value STRING)
ROW FORMAT DELIMITED FIELDS TERMINATED BY ' ';
You don't need to specify a location since you are creating a managed table (and not an external table).
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With