Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

utf8 garbled when importing into mysql

Tags:

mysql

utf-8

Importing UTF8-encoded data into mysql is not working for me. UTF8 characters are corrupted. For example Nöthnagel is displayed as Nöthnagel

I have created a sql dump file to do the importing which contains UTF-8 encoded data. For example:

INSERT INTO `users` VALUES(1, 'Fred','Nöthnagel'); 

The sequence of bytes representing ö in the file is c3 b6 which I believe is correct, as it displays correctly in vim and in my bash shell which has these environment variables set:

$ env | grep -i utf LANG=en_US.UTF-8 XTERM_LOCALE=en_US.UTF-8 

The mysql db was created as follows:

mysql> CREATE DATABASE mydb CHARACTER SET utf8; 

The mysql table was created so:

CREATE TABLE `users` (       `id` int(11) NOT NULL AUTO_INCREMENT,       `first_name` varchar(30) NOT NULL,       `last_name` varchar(30) NOT NULL,     PRIMARY KEY (`id`),       UNIQUE KEY `last_name` (`last_name`)   ) ENGINE=InnoDB AUTO_INCREMENT=2 DEFAULT CHARSET=utf8;   

I am importing the dump file like so:

 mysql -u root -psecret mydb < mydump.sql 

Please tell me what is missing from the above.

like image 654
handros Avatar asked Nov 05 '12 14:11

handros


People also ask

Which is better utf8 or utf8mb4?

The difference between utf8 and utf8mb4 is that the former can only store 3 byte characters, while the latter can store 4 byte characters. In Unicode terms, utf8 can only store characters in the Basic Multilingual Plane, while utf8mb4 can store any Unicode character.

What is the difference between utf8mb4 and utf8 charsets in MySQL?

utf-8 can store only 1, 2 or 3 bytes characters, while utf8mb4 can store 4 bytes characters as well. utf-8 is a subset of characters given by utf8mb4 .


1 Answers

I think it might have something to do with collation as well, but I'm not sure. In my case it certainly did, since I had to support cyrillic.
Try this, worked for me:

  1. Set initial collation while creating the target database to utf8_general_ci

  2. Add SET NAMES 'utf8' COLLATE 'utf8_general_ci'; to the top of your sql file

  3. Run mysql -u root -p --default-character-set=utf8 yourDB < yourSQLfile.sql

One more thing, in order to properly get the UTF-8 data form your database, you'll have to modify your connection string as well. For example:

mysql.url=jdbc:mysql://localhost:3306/nbs?useJvmCharsetConverters=false&useDynamicCharsetInfo=false&useUnicode=true&characterEncoding=UTF-8&characterSetResults=UTF-8&useEncoding=true

Additionally, take a look at what my problem was.

like image 80
Less Avatar answered Oct 08 '22 00:10

Less