I am trying to load data from a csv file in which the values are enclosed by double quotes '"' and tab separated '\t' . But when I try to load that into hive its not throwing any error and data is loaded without any error but I think all the data is getting loaded into a single column and most of the values it showing as NULL. below is my create table statement.
CREATE TABLE example ( organization STRING, order BIGINT, created_on TIMESTAMP, issue_date TIMESTAMP, qty INT ) ROW FORMAT DELIMITED FIELDS TERMINATED BY '\t' ESCAPED BY '"' STORED AS TEXTFILE;
Input file sample;-
"Organization" "Order" "Created on" "issue_date" "qty" "GB" "111223" "2015/02/06 00:00:00" "2015/05/15 00:00:00" "5" "UK" "1110" "2015/05/06 00:00:00" "2015/06/1 00:00:00" "51"
and Load statement to push data into hive table.
LOAD DATA INPATH '/user/example.csv' OVERWRITE INTO TABLE example
What could be the issue and how can I ignore header of the file. and if I remove ESCAPED BY '"' from create statement its loading in respective columns but all the values are enclosed by double quotes. How can I remove double quotes from values and ignore header of the file?
There are 2 accepted ways of escaping double-quotes in a CSV file. One is using a 2 consecutive double-quotes to denote 1 literal double-quote in the data. The alternative is using a backslash and a single double-quote.
You can now use OpenCSVSerde which allows you to define the separator character and easily escape surrounding double-quotes :
CREATE EXTERNAL TABLE example ( organization STRING, order BIGINT, created_on TIMESTAMP, issue_date TIMESTAMP, qty INT ) ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.OpenCSVSerde' WITH SERDEPROPERTIES ( "separatorChar" = "\t", "quoteChar" = "\"" ) LOCATION '/your/folder/location/';
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With