I want to be able to load a file into a MySQL table that has backslashes in it like this:
Fred Los Angeles I am Fred
Nick Madison Great Lakes Whoo
JOHN San Diego Hello world!\
Bob NYC Big apple
User JOHN put a \
at the end of column 3. When I load this into a table, MySQL is interpreting the backslash as some kind of line continuation and concatenating Row 3, Col 3) with (Row 4, Col1).
I can't just copy and paste to fix it since there are millions of rows like this with backslashes in various places causing havoc.
What is the proper way to load a tab-delimited file into a MySQL table with backslashes? Should I be enclosing each (column) field in quotes? Should I write a parser that removes special characters?
You have to specify ESCAPED BY
option when loading your file. In the following example I'm assuming your fields are divided by tab (which is default behavior) and file is located on clients host:
LOAD DATA LOCAL INFILE '/path/to/file/my.txt' INTO TABLE t
FIELDS TERMINATED BY '\t' ESCAPED BY '\b';
Step 1, create your table:
mysql> create table penguin (id int primary key, chucknorris VARCHAR(4000));
Query OK, 0 rows affected (0.01 sec)
Step 2, create your file to import and put this data in there.
1 aliens are on route
2 scramble the nimitz\
3 \its species 8472
4 \\\\\\\\\\\\\\\\\\
5 Bonus characters:!@#$%^&*()_+=-[]\|}{;'":/.?>,< anything but tab
Step 3, insert into your table:
mysql> load data local infile '/home/el/foo/textfile.txt' into table penguin
fields terminated by '\t' lines terminated by '\n'
(@col1, @col2) set id=@col1, chucknorris=@col2;
Query OK, 4 rows affected, 1 warning (0.00 sec)
Records: 4 Deleted: 0 Skipped: 0 Warnings: 1
Step 4, and of course, it causes this weird output:
mysql> select * from penguin;
+----+-----------------------------------------------------------------+
| id | chucknorris |
+----+-----------------------------------------------------------------+
| 1 | aliens are on route |
| 2 | scramble the nimitz
3 |
| 4 | \\\\\\\\\ |
| 5 | Bonus characters:!@#$%^&*()_+=-[]|}{;'":/.?>,< anything but tab |
+----+-----------------------------------------------------------------+
Step 5, analyze the warning:
mysql> show warnings;
+---------+------+--------------------------------------------------------+
| Level | Code | Message |
+---------+------+------------------------------------- ------------------+
| Warning | 1262 | Row 2 was truncated; it contained more data than there |
| | | were input columns |
+---------+------+--------------------------------------------------------+
1 row in set (0.00 sec)
Step 6, think about exactly what went wrong:
The backslash to the left of nimitz
caused the mysql load data parser to concatenate the end of line 2 with the beginning of line 3. Then it bumped up against a tab and put 'scramble the nimitz\n3 into row 2.
The rest of row 3 is skipped because the extra words its species 8472
do not fit anywhere, it produces the warning you see above.
Row 4 had 18 backslashes, so there is no problem, and shows up as 9 backslahes because each was escaped. Had there been an odd number, the error on row 2 would have happened to row 4.
The bonus characters on row 5 came through normally. Everything is allowed except tab.
Step 7, reset penguin:
mysql> delete from penguin;
Step 8, load into your table with the fields escaped by
clause:
mysql> load data local infile '/home/el/foo/textfile.txt' into table penguin
fields terminated by '\t' escaped by '\b'
lines terminated by '\n' (@col1, @col2) set id=@col1,
chucknorris=@col2;
Query OK, 5 rows affected (0.00 sec)
Records: 5 Deleted: 0 Skipped: 0 Warnings: 0
Step 9, select from your table, interpret the results:
mysql> select * from penguin;
+----+------------------------------------------------------------------+
| id | chucknorris |
+----+------------------------------------------------------------------+
| 1 | aliens are on route |
| 2 | scramble the nimitz\ |
| 3 | \its species 8472 |
| 4 | \\\\\\\\\\\\\\\\\\ |
| 5 | Bonus characters:!@#$%^&*()_+=-[]\|}{;'":/.?>,< anything but tab |
+----+------------------------------------------------------------------+
5 rows in set (0.00 sec)
And now everything is as we expect. The backslash at the end of line 2 does not escape the newline. The backslash before i
on row 3 doesn't do anything. The 18 backslashes on row 4 are not escaped. And the bonus characters come through ok.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With