I have a bunch of CSV data that I need to load into a MySQL database. Well, CSV-ish, perhaps. (edit: actually, it looks like the stuff described in RFC 4180)
Each row is a list of comma-separated doublequoted strings. To escape any doublequotes that appear within a column value, double doublequotes are used. Backslashes are allowed to represent themselves.
For example, the line:
"", "\wave\", ""hello,"" said the vicar", "what are ""scare-quotes"" good for?", "I'm reading ""Bossypants"""
if parsed into JSON should be:
[ "", "\\wave\\", "\"hello,\" said the vicar", "what are \"scare-quotes\" good for?", "I'm reading \"Bossypants\"" ]
I'm trying to use the LOAD DATA
to read the CSV in, but I'm running into some weird behaviour.
As an example, consider if I have a simple two column table
shell% mysql exampledb -e "describe person"
+-------+-----------+------+-----+---------+-------+
| Field | Type | Null | Key | Default | Extra |
+-------+-----------+------+-----+---------+-------+
| ID | int(11) | YES | | NULL | |
| UID | char(255) | YES | | NULL | |
+-------+-----------+------+-----+---------+-------+
shell%
If the first non-header line of my input file ends on ""
:
shell% cat temp-1.csv
"ID","UID"
"9",""
"0","Steve the Pirate"
"1","\Alpha"
"2","Hoban ""Wash"" Washburne"
"3","Pastor Veal"
"4","Tucker"
"10",""
"5","Simon"
"6","Sonny"
"7","Wat\"
I can either load every non-header line but the first:
mysql> DELETE FROM person;
Query OK, 0 rows affected (0.00 sec)
mysql> LOAD DATA
LOCAL INFILE 'temp-1.csv'
INTO TABLE person
FIELDS
TERMINATED BY ','
ENCLOSED BY '"'
ESCAPED BY '"'
LINES
TERMINATED BY '\n'
IGNORE 1 LINES
;
Query OK, 9 rows affected (0.00 sec)
Records: 9 Deleted: 0 Skipped: 0 Warnings: 0
mysql> SELECT * FROM person;
+------+------------------------+
| ID | UID |
+------+------------------------+
| 0 | Steve the Pirate |
| 10 | |
| 1 | \Alpha |
| 2 | Hoban "Wash" Washburne |
| 3 | Pastor Veal |
| 4 | Tucker |
| 5 | Simon |
| 6 | Sonny |
| 7 | Wat\ |
+------+------------------------+
9 rows in set (0.00 sec)
Or I can load all lines including the header:
mysql> DELETE FROM person;
Query OK, 9 rows affected (0.00 sec)
mysql> LOAD DATA
LOCAL INFILE 'temp-1.csv'
INTO TABLE person
FIELDS
TERMINATED BY ','
ENCLOSED BY '"'
ESCAPED BY '"'
LINES
TERMINATED BY '\n'
IGNORE 0 LINES
;
Query OK, 11 rows affected, 1 warning (0.01 sec)
Records: 11 Deleted: 0 Skipped: 0 Warnings: 1
mysql> show warnings;
+---------+------+--------------------------------------------------------+
| Level | Code | Message |
+---------+------+--------------------------------------------------------+
| Warning | 1366 | Incorrect integer value: 'ID' for column 'ID' at row 1 |
+---------+------+--------------------------------------------------------+
1 row in set (0.00 sec)
mysql> SELECT * FROM person;
+------+------------------------+
| ID | UID |
+------+------------------------+
| 0 | UID |
| 9 | |
| 0 | Steve the Pirate |
| 10 | |
| 1 | \Alpha |
| 2 | Hoban "Wash" Washburne |
| 3 | Pastor Veal |
| 4 | Tucker |
| 5 | Simon |
| 6 | Sonny |
| 7 | Wat\ |
+------+------------------------+
11 rows in set (0.00 sec)
If no lines of my input file end on ""
:
shell% cat temp-2.csv
"ID","UID"
"0","Steve the Pirate"
"1","\Alpha"
"2","Hoban ""Wash"" Washburne"
"3","Pastor Veal"
"4","Tucker"
"5","Simon"
"6","Sonny"
"7","Wat\"
then I can either load no lines:
mysql> DELETE FROM person;
Query OK, 11 rows affected (0.00 sec)
mysql> LOAD DATA
LOCAL INFILE 'temp-2.csv'
INTO TABLE person
FIELDS
TERMINATED BY ','
ENCLOSED BY '"'
ESCAPED BY '"'
LINES
TERMINATED BY '\n'
IGNORE 1 LINES
;
Query OK, 0 rows affected (0.00 sec)
Records: 0 Deleted: 0 Skipped: 0 Warnings: 0
mysql> SELECT * FROM person;
Empty set (0.00 sec)
Or I can load all the lines including the header:
mysql> DELETE FROM person;
Query OK, 0 rows affected (0.00 sec)
mysql> LOAD DATA
LOCAL INFILE 'temp-2.csv'
INTO TABLE person
FIELDS
TERMINATED BY ','
ENCLOSED BY '"'
ESCAPED BY '"'
LINES
TERMINATED BY '\n'
IGNORE 0 LINES
;
Query OK, 9 rows affected, 1 warning (0.03 sec)
Records: 9 Deleted: 0 Skipped: 0 Warnings: 1
mysql> show warnings;
+---------+------+--------------------------------------------------------+
| Level | Code | Message |
+---------+------+--------------------------------------------------------+
| Warning | 1366 | Incorrect integer value: 'ID' for column 'ID' at row 1 |
+---------+------+--------------------------------------------------------+
1 row in set (0.00 sec)
mysql> SELECT * FROM person;
+------+------------------------+
| ID | UID |
+------+------------------------+
| 0 | UID |
| 0 | Steve the Pirate |
| 1 | \Alpha |
| 2 | Hoban "Wash" Washburne |
| 3 | Pastor Veal |
| 4 | Tucker |
| 5 | Simon |
| 6 | Sonny |
| 7 | Wat\ |
+------+------------------------+
9 rows in set (0.00 sec)
So now that I've discovered many ways to do it wrong, how can I use LOAD DATA
to import the data from these files into my database?
By default, the escape character is a " (double quote) for CSV-formatted files. If you want to use a different escape character, use the ESCAPE clause of COPY , CREATE EXTERNAL TABLE or gpload to declare a different escape character.
There are 2 accepted ways of escaping double-quotes in a CSV file. One is using a 2 consecutive double-quotes to denote 1 literal double-quote in the data. The alternative is using a backslash and a single double-quote.
Yes. You can import double quotation marks using CSV files and import maps by escaping the double quotation marks. To escape the double quotation marks, enclose them within another double quotation mark.
Actually, the only value to escape is double quotes symbol. All other cell content gets into it and displayed correctly in Excel. Checked with various versions of Excel and ODBC CSV parsers in Cyrillic locale under Windows. Save this answer.
According to the documentation for LOAD DATA
, treating doubled double quotes as a double quote is the default:
If the field begins with the ENCLOSED BY character, instances of that character are recognized as terminating a field value only if followed by the field or line TERMINATED BY sequence. To avoid ambiguity, occurrences of the ENCLOSED BY character within a field value can be doubled and are interpreted as a single instance of the character. For example, if ENCLOSED BY '"' is specified, quotation marks are handled as shown here:
"The ""BIG"" boss" -> The "BIG" boss The "BIG" boss -> The "BIG" boss The ""BIG"" boss -> The ""BIG"" boss
So all I need to do is disable interpreting \
as an escape character, by using ESCAPED BY ''
.
LOAD DATA
LOCAL INFILE 'temp-1.csv'
INTO TABLE person
FIELDS
TERMINATED BY ','
ENCLOSED BY '"'
ESCAPED BY ''
LINES
TERMINATED BY '\n'
IGNORE 1 LINES
;
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With