Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

MySQL loading a tab-delimited file into a table with special characters

Tags:

sql

mysql

I want to be able to load a file into a MySQL table that has backslashes in it like this:

Fred          Los Angeles        I am Fred
Nick          Madison            Great Lakes Whoo
JOHN          San Diego          Hello world!\
Bob           NYC                Big apple            

User JOHN put a \ at the end of column 3. When I load this into a table, MySQL is interpreting the backslash as some kind of line continuation and concatenating Row 3, Col 3) with (Row 4, Col1).

I can't just copy and paste to fix it since there are millions of rows like this with backslashes in various places causing havoc.

What is the proper way to load a tab-delimited file into a MySQL table with backslashes? Should I be enclosing each (column) field in quotes? Should I write a parser that removes special characters?

like image 517
ktm5124 Avatar asked Mar 21 '13 00:03

ktm5124


2 Answers

You have to specify ESCAPED BY option when loading your file. In the following example I'm assuming your fields are divided by tab (which is default behavior) and file is located on clients host:

LOAD DATA LOCAL INFILE '/path/to/file/my.txt' INTO TABLE t
FIELDS TERMINATED BY '\t' ESCAPED BY '\b';
like image 79
b.b3rn4rd Avatar answered Oct 27 '22 01:10

b.b3rn4rd


How to tame backslashes in mysql load data infile tool:

Step 1, create your table:

mysql> create table penguin (id int primary key, chucknorris VARCHAR(4000));
Query OK, 0 rows affected (0.01 sec)

Step 2, create your file to import and put this data in there.

1   aliens are on route
2   scramble the nimitz\
3   \its species 8472
4   \\\\\\\\\\\\\\\\\\
5   Bonus characters:!@#$%^&*()_+=-[]\|}{;'":/.?>,< anything but tab

Step 3, insert into your table:

mysql> load data local infile '/home/el/foo/textfile.txt' into table penguin 
       fields terminated by '\t' lines terminated by '\n' 
       (@col1, @col2) set id=@col1, chucknorris=@col2;
Query OK, 4 rows affected, 1 warning (0.00 sec)
Records: 4  Deleted: 0  Skipped: 0  Warnings: 1

Step 4, and of course, it causes this weird output:

mysql> select * from penguin;
+----+-----------------------------------------------------------------+
| id | chucknorris                                                     |
+----+-----------------------------------------------------------------+
|  1 | aliens are on route                                             |
|  2 | scramble the nimitz
3                                           |
|  4 | \\\\\\\\\                                                       |
|  5 | Bonus characters:!@#$%^&*()_+=-[]|}{;'":/.?>,< anything but tab |
+----+-----------------------------------------------------------------+

Step 5, analyze the warning:

mysql> show warnings;
+---------+------+--------------------------------------------------------+
| Level   | Code | Message                                                |
+---------+------+------------------------------------- ------------------+
| Warning | 1262 | Row 2 was truncated; it contained more data than there |
|         |      | were input columns                                     |
+---------+------+--------------------------------------------------------+
1 row in set (0.00 sec)

Step 6, think about exactly what went wrong:

The backslash to the left of nimitz caused the mysql load data parser to concatenate the end of line 2 with the beginning of line 3. Then it bumped up against a tab and put 'scramble the nimitz\n3 into row 2.

The rest of row 3 is skipped because the extra words its species 8472 do not fit anywhere, it produces the warning you see above.

Row 4 had 18 backslashes, so there is no problem, and shows up as 9 backslahes because each was escaped. Had there been an odd number, the error on row 2 would have happened to row 4.

The bonus characters on row 5 came through normally. Everything is allowed except tab.

Step 7, reset penguin:

mysql> delete from penguin;

Step 8, load into your table with the fields escaped by clause:

mysql> load data local infile '/home/el/foo/textfile.txt' into table penguin 
       fields terminated by '\t' escaped by '\b' 
       lines terminated by '\n' (@col1, @col2) set id=@col1, 
       chucknorris=@col2;

Query OK, 5 rows affected (0.00 sec)
Records: 5  Deleted: 0  Skipped: 0  Warnings: 0

Step 9, select from your table, interpret the results:

mysql> select * from penguin;
+----+------------------------------------------------------------------+
| id | chucknorris                                                      |
+----+------------------------------------------------------------------+
|  1 | aliens are on route                                              |
|  2 | scramble the nimitz\                                             |
|  3 | \its species 8472                                                |
|  4 | \\\\\\\\\\\\\\\\\\                                               |
|  5 | Bonus characters:!@#$%^&*()_+=-[]\|}{;'":/.?>,< anything but tab |
+----+------------------------------------------------------------------+
5 rows in set (0.00 sec)

And now everything is as we expect. The backslash at the end of line 2 does not escape the newline. The backslash before i on row 3 doesn't do anything. The 18 backslashes on row 4 are not escaped. And the bonus characters come through ok.

like image 38
Eric Leschinski Avatar answered Oct 27 '22 00:10

Eric Leschinski