Importing UTF8-encoded data into mysql is not working for me. UTF8 characters are corrupted. For example Nöthnagel is displayed as Nöthnagel
I have created a sql dump file to do the importing which contains UTF-8 encoded data. For example:
INSERT INTO `users` VALUES(1, 'Fred','Nöthnagel');
The sequence of bytes representing ö in the file is c3 b6 which I believe is correct, as it displays correctly in vim and in my bash shell which has these environment variables set:
$ env | grep -i utf LANG=en_US.UTF-8 XTERM_LOCALE=en_US.UTF-8
The mysql db was created as follows:
mysql> CREATE DATABASE mydb CHARACTER SET utf8;
The mysql table was created so:
CREATE TABLE `users` ( `id` int(11) NOT NULL AUTO_INCREMENT, `first_name` varchar(30) NOT NULL, `last_name` varchar(30) NOT NULL, PRIMARY KEY (`id`), UNIQUE KEY `last_name` (`last_name`) ) ENGINE=InnoDB AUTO_INCREMENT=2 DEFAULT CHARSET=utf8;
I am importing the dump file like so:
mysql -u root -psecret mydb < mydump.sql
Please tell me what is missing from the above.
The difference between utf8 and utf8mb4 is that the former can only store 3 byte characters, while the latter can store 4 byte characters. In Unicode terms, utf8 can only store characters in the Basic Multilingual Plane, while utf8mb4 can store any Unicode character.
utf-8 can store only 1, 2 or 3 bytes characters, while utf8mb4 can store 4 bytes characters as well. utf-8 is a subset of characters given by utf8mb4 .
I think it might have something to do with collation as well, but I'm not sure. In my case it certainly did, since I had to support cyrillic.
Try this, worked for me:
Set initial collation while creating the target database to utf8_general_ci
Add SET NAMES 'utf8' COLLATE 'utf8_general_ci';
to the top of your sql file
Run mysql -u root -p --default-character-set=utf8 yourDB < yourSQLfile.sql
One more thing, in order to properly get the UTF-8 data form your database, you'll have to modify your connection string as well. For example:
mysql.url=jdbc:mysql://localhost:3306/nbs?useJvmCharsetConverters=false&useDynamicCharsetInfo=false&useUnicode=true&characterEncoding=UTF-8&characterSetResults=UTF-8&useEncoding=true
Additionally, take a look at what my problem was.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With