I've recently noticed that, when ever I start a new WordPress project, my tables' collation automatically changes from utf8_unicode_ci
(which I select when I create a new DB from phpMyAdmin) to utf8mb4_unicode_520_ci
.
Also, I've noticed in phpMyAdmin under “General Settings” that server connection Collation defaults to utf8mb4_unicode_520_ci
.
I'm running MySQL Server 5.7.17 and phpMyAdmin 4.6.6 on Ubuntu 17.04.
My questions are following:
utf8mb4
I've experienced problems when migrating WP sites to an older MySQL server which does not support it.utf8mb4
over utf8
, and collation utf8mb4_unicode_520_ci
over utf8_unicode_ci
?The difference between utf8 and utf8mb4 is that the former can only store 3 byte characters, while the latter can store 4 byte characters. In Unicode terms, utf8 can only store characters in the Basic Multilingual Plane, while utf8mb4 can store any Unicode character.
utf-8 can store only 1, 2 or 3 bytes characters, while utf8mb4 can store 4 bytes characters as well. utf-8 is a subset of characters given by utf8mb4 .
From MySQL 8.0, utf8mb4 is the default character set, and the default collation for utf8mb4 is utf8mb4_0900_ai_ci. MySQL 8.0 is also coming with a whole new set of Unicode collations for the utf8mb4 character set. This will allow use of the complete Unicode 9.0.
To solve the problem open the exported SQL file, search and replace the utf8mb4 with utf8 , after that search and replace the utf8mb4_unicode_520_ci with utf8_general_ci . Save the file and import it into your database. After that, change the wp-config. php charset option to utf8 , and the magic starts.
In the past, there was only utf8
; in the future, now utf8mb4
will be the default character set.utf8mb4
is the default character set.
In the past, _general_ci
was the default collation; then _unicode_ci
(Unicode 4.0) was better, then _unicode_520_ci
(Unicode 5.20). In the future (MySQL 8.0), the default will be _0900_ci_ai
(Unicode 9.0).
Meanwhile, the road is full of potholes generated by MySQL's past mistakes. And WP designers are driving in a big tank that does not notice the potholes.
MySQL 5.6 was a big pothole that swallowed up many a WP user because of a 767 limit on indexes together with WP indexes on the overly-long VARCHAR(255)
and the possibility of using utf8mb4
. You are well past it by having 5.7.17. (Your future move to 8.0 will be less bumpy.)
That is, newly created databases/tables/columns on 5.7.7+ should not experience the 767 problem, but things migrated from older versions (5.5.3+) may have issues, especially if something causes you to change to utf8mb4.
What to do? I'll probably run out of space trying to spell out all the options. So provide the history of the data, the upgrade path (if any), the current settings, the ROW_FORMAT
of the tables, the CHARACTER SET
and COLLATION
of the columns, the output of SHOW VARIABLES LIKE 'char%';
Where should you be? For 5.7.7+, utf8mb4
and utf8mb4_unicode_520_ci
wherever practical. That charset gives you Emoji and all of Chinese (utf8 does not). That collation is the best available, although you might be hard pressed to notice where it matters.
Note: the first part of the collation name is the only character set that it works with. That is utf8_unicode_ci
does not work with utf8mb4
.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With