I am trying to load a CSV file into a Hive table like so:
CREATE TABLE mytable ( num1 INT, text1 STRING, num2 INT, text2 STRING ) ROW FORMAT DELIMITED FIELDS TERMINATED BY ","; LOAD DATA LOCAL INPATH '/data.csv' OVERWRITE INTO TABLE mytable;
The csv is delimited by an comma (,) and looks like this:
1, "some text, with comma in it", 123, "more text"
This will return corrupt data since there is a ',' in the first string.
Is there a way to set an text delimiter or make Hive ignore the ',' in strings?
I can't change the delimiter of the csv since it gets pulled from an external source.
A comma-separated values (CSV) file is a delimited text file that uses a comma to separate values. Each line of the file is a data record. Each record consists of one or more fields, separated by commas. The use of the comma as a field separator is the source of the name for this file format.
ROW FORMAT should have delimiters used to terminate the fields and lines like in the above example the fields are terminated with comma (“,”). The default location of Hive table is overwritten by using LOCATION. So the data now is stored in data/weather folder inside hive.
You need to specify text qualifiers. Generally a double quote (") is used as text qualifiers. All the text is always put inside it and all the commas inside a text qualifier is ignored. This is a standard method for all CSV, languages and all platforms for properly handling the text.
If you can re-create or parse your input data, you can specify an escape character for the CREATE TABLE:
ROW FORMAT DELIMITED FIELDS TERMINATED BY "," ESCAPED BY '\\';
Will accept this line as 4 fields
1,some text\, with comma in it,123,more text
The problem is that Hive
doesn't handle quoted texts. You either need to pre-process the data by changing the delimiter between the fields (e.g: with a Hadoop-streaming job) or you can also give a try to use a custom CSV SerDe which uses OpenCSV to parse the files.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With