I have to redesign a class where (amongst other things) UTF-8 strings are double-encoded wrongly:
$string = iconv('ISO-8859-1', 'UTF-8', $string);
:
$string = utf8_encode($string);
These faulty strings have been saved into multiple table fields all over a MySQL database. All fields being affected use collation utf8_general_ci
.
Usually I'd setup a little PHP patch script, looping thru the affected tables, SELECTing the records, correct the faulty records by using utf8_decode()
on the double-encoded fields and UPDATE them.
As I got many and huge tables this time, and the error only affects german umlauts (äöüßÄÖÜ), I'm wondering if there's a solution smarter/faster than that.
Are pure MySQL solutions like the following safe and recommendable?
UPDATE `table` SET `col` = REPLACE(`col`, 'ä', 'ä');
Any other solutions/best practices?
To change the character set encoding to UTF-8 for the database itself, type the following command at the mysql> prompt. Replace dbname with the database name: Copy ALTER DATABASE dbname CHARACTER SET utf8 COLLATE utf8_general_ci; To exit the mysql program, type \q at the mysql> prompt.
Similarly, here's the command to change character set of MySQL table from latin1 to UTF8. Replace table_name with your database table name. mysql> ALTER TABLE table_name CONVERT TO CHARACTER SET utf8 COLLATE utf8_unicode_ci; Hopefully, the above tutorial will help you change database character set to utf8mb4 (UTF-8).
utf-8 can store only 1, 2 or 3 bytes characters, while utf8mb4 can store 4 bytes characters as well. utf-8 is a subset of characters given by utf8mb4 .
To solve the problem open the exported SQL file, search and replace the utf8mb4 with utf8 , after that search and replace the utf8mb4_unicode_520_ci with utf8_general_ci . Save the file and import it into your database. After that, change the wp-config. php charset option to utf8 , and the magic starts.
Alter the table to change the column character set to Latin-1. You will now have singly-encoded UTF-8 strings, but sitting in a field whose collation is supposed to be Latin-1.
What you do then is, change the column character set back to UTF-8 via the binary character set - that way MySQL doesn't convert the characters at any point.
ALTER TABLE MyTable MODIFY MyColumn ... CHARACTER SET latin1
ALTER TABLE MyTable MODIFY MyColumn ... CHARACTER SET binary
ALTER TABLE MyTable MODIFY MyColumn ... CHARACTER SET utf8
(is the correct syntax iirc; put the appropriate column type in where ...
is)
I tried the posted solutions, but my DB kept spitting up errors. Eventually I stumbled upon the following solution (in a forum I believe, but I can't remember where):
UPDATE table_name SET col_name = CONVERT(CONVERT(CONVERT(col_name USING latin1) USING binary) USING utf8);
and it worked a treat. Hope this helps anyone who stumbled here from desperate google searching like me.
NOTE: This is of course assuming your double encoded character issues originate from an overly helpful MySQL conversion from latin1 to utf8, but I believe that's where most of these "corrupted characters" happen. This basically does the same conversion as mentioned above back to latin1, then binary, then to utf8 (using the binary step as a way to prevent the re-encoding of the already encoded latin1 entities)
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With