Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Why is table CHARSET set to utf8mb4 and COLLATION to utf8mb4_unicode_520_ci

I've recently noticed that, when ever I start a new WordPress project, my tables' collation automatically changes from utf8_unicode_ci (which I select when I create a new DB from phpMyAdmin) to utf8mb4_unicode_520_ci.

Also, I've noticed in phpMyAdmin under “General Settings” that server connection Collation defaults to utf8mb4_unicode_520_ci.

I'm running MySQL Server 5.7.17 and phpMyAdmin 4.6.6 on Ubuntu 17.04.

My questions are following:

  1. Why is this happening?
  2. If possible, how do I prevent this? Because of utf8mb4 I've experienced problems when migrating WP sites to an older MySQL server which does not support it.
  3. Is point 2. advisable? Are there any benefits in using charset utf8mb4 over utf8, and collation utf8mb4_unicode_520_ci over utf8_unicode_ci?
like image 349
Томица Кораћ Avatar asked Apr 26 '17 20:04

Томица Кораћ


People also ask

Should I use utf8mb4 or UTF-8?

The difference between utf8 and utf8mb4 is that the former can only store 3 byte characters, while the latter can store 4 byte characters. In Unicode terms, utf8 can only store characters in the Basic Multilingual Plane, while utf8mb4 can store any Unicode character.

What does the utf8mb4 character set allow for?

utf-8 can store only 1, 2 or 3 bytes characters, while utf8mb4 can store 4 bytes characters as well. utf-8 is a subset of characters given by utf8mb4 .

What is the default collation for utf8mb4?

From MySQL 8.0, utf8mb4 is the default character set, and the default collation for utf8mb4 is utf8mb4_0900_ai_ci. MySQL 8.0 is also coming with a whole new set of Unicode collations for the utf8mb4 character set. This will allow use of the complete Unicode 9.0.

How do I change utf8mb4 to UTF-8?

To solve the problem open the exported SQL file, search and replace the utf8mb4 with utf8 , after that search and replace the utf8mb4_unicode_520_ci with utf8_general_ci . Save the file and import it into your database. After that, change the wp-config. php charset option to utf8 , and the magic starts.


1 Answers

In the past, there was only utf8; in the future, utf8mb4 will be the default character set. now utf8mb4 is the default character set.

In the past, _general_ci was the default collation; then _unicode_ci (Unicode 4.0) was better, then _unicode_520_ci (Unicode 5.20). In the future (MySQL 8.0), the default will be _0900_ci_ai (Unicode 9.0).

Meanwhile, the road is full of potholes generated by MySQL's past mistakes. And WP designers are driving in a big tank that does not notice the potholes.

MySQL 5.6 was a big pothole that swallowed up many a WP user because of a 767 limit on indexes together with WP indexes on the overly-long VARCHAR(255) and the possibility of using utf8mb4. You are well past it by having 5.7.17. (Your future move to 8.0 will be less bumpy.)

That is, newly created databases/tables/columns on 5.7.7+ should not experience the 767 problem, but things migrated from older versions (5.5.3+) may have issues, especially if something causes you to change to utf8mb4.

What to do? I'll probably run out of space trying to spell out all the options. So provide the history of the data, the upgrade path (if any), the current settings, the ROW_FORMAT of the tables, the CHARACTER SET and COLLATION of the columns, the output of SHOW VARIABLES LIKE 'char%';

Where should you be? For 5.7.7+, utf8mb4 and utf8mb4_unicode_520_ci wherever practical. That charset gives you Emoji and all of Chinese (utf8 does not). That collation is the best available, although you might be hard pressed to notice where it matters.

Note: the first part of the collation name is the only character set that it works with. That is utf8_unicode_ci does not work with utf8mb4.

like image 182
Rick James Avatar answered Oct 06 '22 16:10

Rick James