Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to load CSV data with enclosed by double quotes and separated by tab into HIVE table?

Tags:

I am trying to load data from a csv file in which the values are enclosed by double quotes '"' and tab separated '\t' . But when I try to load that into hive its not throwing any error and data is loaded without any error but I think all the data is getting loaded into a single column and most of the values it showing as NULL. below is my create table statement.

CREATE TABLE example ( organization  STRING, order BIGINT, created_on  TIMESTAMP, issue_date TIMESTAMP, qty  INT ) ROW FORMAT DELIMITED FIELDS TERMINATED BY '\t'  ESCAPED BY '"' STORED AS TEXTFILE; 

Input file sample;-

 "Organization" "Order"  "Created on"   "issue_date"   "qty"  "GB"   "111223"    "2015/02/06 00:00:00"   "2015/05/15 00:00:00"   "5"  "UK"   "1110"  "2015/05/06 00:00:00"   "2015/06/1 00:00:00"   "51" 

and Load statement to push data into hive table.

 LOAD DATA INPATH '/user/example.csv' OVERWRITE INTO TABLE example 

What could be the issue and how can I ignore header of the file. and if I remove ESCAPED BY '"' from create statement its loading in respective columns but all the values are enclosed by double quotes. How can I remove double quotes from values and ignore header of the file?

like image 874
Sharad Avatar asked Jun 04 '15 07:06

Sharad


People also ask

How do I fix a double quote in a CSV file?

There are 2 accepted ways of escaping double-quotes in a CSV file. One is using a 2 consecutive double-quotes to denote 1 literal double-quote in the data. The alternative is using a backslash and a single double-quote.


1 Answers

You can now use OpenCSVSerde which allows you to define the separator character and easily escape surrounding double-quotes :

CREATE EXTERNAL TABLE example (    organization  STRING,    order BIGINT,    created_on  TIMESTAMP,    issue_date TIMESTAMP,    qty  INT ) ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.OpenCSVSerde' WITH SERDEPROPERTIES (    "separatorChar" = "\t",    "quoteChar"     = "\"" )   LOCATION '/your/folder/location/'; 
like image 184
cheseaux Avatar answered Oct 22 '22 01:10

cheseaux