Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to fix "Incorrect string value" errors?

Tags:

mysql

After noticing an application tended to discard random emails due to incorrect string value errors, I went though and switched many text columns to use the utf8 column charset and the default column collate (utf8_general_ci) so that it would accept them. This fixed most of the errors, and made the application stop getting sql errors when it hit non-latin emails, too.

Despite this, some of the emails are still causing the program to hit incorrect string value errrors: (Incorrect string value: '\xE4\xC5\xCC\xC9\xD3\xD8...' for column 'contents' at row 1)

The contents column is a MEDIUMTEXT datatybe which uses the utf8 column charset and the utf8_general_ci column collate. There are no flags that I can toggle in this column.

Keeping in mind that I don't want to touch or even look at the application source code unless absolutely necessary:

  • What is causing that error? (yes, I know the emails are full of random garbage, but I thought utf8 would be pretty permissive)
  • How can I fix it?
  • What are the likely effects of such a fix?

One thing I considered was switching to a utf8 varchar([some large number]) with the binary flag turned on, but I'm rather unfamiliar with MySQL, and have no idea if such a fix makes sense.

like image 459
Brian Avatar asked Jul 22 '09 20:07

Brian


People also ask

What does incorrect string value mean?

To conclude, the ERROR 1366: Incorrect string value happens when MySQL can't insert the value you specified into the table because of incompatible encoding. You need to modify or remove characters that have 4-bytes UTF-8 encoding, or you can change the encoding and collation used by MySQL.

What is the difference between utf8 and utf8mb4?

The difference between utf8 and utf8mb4 is that the former can only store 3 byte characters, while the latter can store 4 byte characters. In Unicode terms, utf8 can only store characters in the Basic Multilingual Plane, while utf8mb4 can store any Unicode character.

What is utf8_unicode_ci?

utf8_unicode_ci uses the standard Unicode Collation Algorithm, supports so called expansions and ligatures, for example: German letter ß (U+00DF LETTER SHARP S) is sorted near "ss" Letter Œ (U+0152 LATIN CAPITAL LIGATURE OE) is sorted near "OE".


1 Answers

UPDATE to the below answer:

The time the question was asked, "UTF8" in MySQL meant utf8mb3. In the meantime, utf8mb4 was added, but to my knowledge MySQLs "UTF8" was not switched to mean utf8mb4.

That means, you'd need to specifically put "utf8mb4", if you mean it (and you should use utf8mb4)

I'll keep this here instead of just editing the answer, to make clear there is still a difference when saying "UTF8"

Original

I would not suggest Richies answer, because you are screwing up the data inside the database. You would not fix your problem but try to "hide" it and not being able to perform essential database operations with the crapped data.

If you encounter this error either the data you are sending is not UTF-8 encoded, or your connection is not UTF-8. First, verify, that the data source (a file, ...) really is UTF-8.

Then, check your database connection, you should do this after connecting:

SET NAMES 'utf8mb4'; SET CHARACTER SET utf8mb4; 

Next, verify that the tables where the data is stored have the utf8mb4 character set:

SELECT   `tables`.`TABLE_NAME`,   `collations`.`character_set_name` FROM   `information_schema`.`TABLES` AS `tables`,   `information_schema`.`COLLATION_CHARACTER_SET_APPLICABILITY` AS `collations` WHERE   `tables`.`table_schema` = DATABASE()   AND `collations`.`collation_name` = `tables`.`table_collation` ; 

Last, check your database settings:

mysql> show variables like '%colla%'; mysql> show variables like '%charac%'; 

If source, transport and destination are utf8mb4, your problem is gone;)

like image 96
nico gawenda Avatar answered Oct 11 '22 07:10

nico gawenda