Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

"Incorrect string value" when trying to insert UTF-8 into MySQL via JDBC?

This is how my connection is set:
Connection conn = DriverManager.getConnection(url + dbName + "?useUnicode=true&characterEncoding=utf-8", userName, password);

And I'm getting the following error when tyring to add a row to a table:
Incorrect string value: '\xF0\x90\x8D\x83\xF0\x90...' for column 'content' at row 1

I'm inserting thousands of records, and I always get this error when the text contains \xF0 (i.e. the the incorrect string value always starts with \xF0).

The column's collation is utf8_general_ci.

What could be the problem?

like image 992
Lior Avatar asked Jun 08 '12 23:06

Lior


People also ask

What is incorrect string value?

To conclude, the ERROR 1366: Incorrect string value happens when MySQL can't insert the value you specified into the table because of incompatible encoding. You need to modify or remove characters that have 4-bytes UTF-8 encoding, or you can change the encoding and collation used by MySQL.

What is the difference between UTF-8 and utf8mb4?

The difference between utf8 and utf8mb4 is that the former can only store 3 byte characters, while the latter can store 4 byte characters. In Unicode terms, utf8 can only store characters in the Basic Multilingual Plane, while utf8mb4 can store any Unicode character.

What is utf8_unicode_ci?

utf8_unicode_ci uses the standard Unicode Collation Algorithm, supports so called expansions and ligatures, for example: German letter ß (U+00DF LETTER SHARP S) is sorted near "ss" Letter Œ (U+0152 LATIN CAPITAL LIGATURE OE) is sorted near "OE".


2 Answers

MySQL's utf8 permits only the Unicode characters that can be represented with 3 bytes in UTF-8. Here you have a character that needs 4 bytes: \xF0\x90\x8D\x83 (U+10343 GOTHIC LETTER SAUIL).

If you have MySQL 5.5 or later you can change the column encoding from utf8 to utf8mb4. This encoding allows storage of characters that occupy 4 bytes in UTF-8.

You may also have to set the server property character_set_server to utf8mb4 in the MySQL configuration file. It seems that Connector/J defaults to 3-byte Unicode otherwise:

For example, to use 4-byte UTF-8 character sets with Connector/J, configure the MySQL server with character_set_server=utf8mb4, and leave characterEncoding out of the Connector/J connection string. Connector/J will then autodetect the UTF-8 setting.

like image 158
Joni Avatar answered Oct 05 '22 23:10

Joni


The strings that contain \xF0 are simply characters encoded as multiple bytes using UTF-8.

Although your collation is set to utf8_general_ci, I suspect that the character encoding of the database, table or even column may be different. They are independent settings. Try:

ALTER TABLE database.table MODIFY COLUMN col VARCHAR(255)       CHARACTER SET utf8mb4 COLLATE utf8mb4_general_ci NOT NULL; 

Substitute whatever your actual data type is for VARCHAR(255)

like image 30
Eric J. Avatar answered Oct 05 '22 22:10

Eric J.