MySQL collation: utf8mb4_unicode_ci vs "utf8mb4 - default collation"

1 Answers

utf8mb4_default?? Where do you see this?

The default collation (before MySQL 8.0) for utf8mb4 is utf8mb4_general_ci. This checks only one byte at a time, so ss is not considered equal to ß. Most of the other collations for utf8mb4 do consider them equal.

Next in the list of "better" collations for general use (as opposed to Spanish-specific, etc) is utf8mb4_unicode_ci. This matches the Unicode Collation Algorithm version 4.0, written several years ago.

Then comes utf8mb4_unicode_520_ci (Unicode 5.20), which handles more things "correctly".

When you get to MySQL 8.0, there will be a 9.0 version, utf8mb4_0900_ai_ci.

For details on the differences, see http://mysql.rjweb.org/utf8_collations.html . (Note: "utf8" versus "utf8mb4" work the same for the information provided on that page.) The first thing to note:

utf8_general_ci              A=a=À=Á=Â=Ã=Ä=Å=à=á=â=ã=ä=å=Ā=ā=Ą=ą    Aa  ae          az
utf8_unicode_ci              A=a=ª=À=Á=Â=Ã=Ä=Å=à=á=â=ã=ä=å=Ā=ā=Ą=ą  Aa  ae          az            Æ=æ
utf8_unicode_520_ci          A=a=ª=À=Á=Â=Ã=Ä=Å=à=á=â=ã=ä=å=Ā=ā=Ą=ą  Aa  ae=Æ=æ      az

These 3 lines point out 3 different treatments of Æ and æ.

Those two ligatures are treated equal ("case insensitive").
general does not sort it anywhere near the other A's. (Far below, we see that they sort after Z.)
unicode sorts them after all A's, and just before B, as if they were a separate "letter".
_unicode_520_ treats them as equal to letter pair ae.

For 5.7, and without any specific language requirements, I would use utf8mb4_unicode_520_ci.

Back to your question of "why". Changing defaults runs the risk of hurting existing installations more than it helps. So, I guess, the designers were conservative. On the other hand, 8.0 has a lot of major changes, so there was less reluctance to change. Hence, the move to utf8mb4_0900_ai_ci.

172

answered Oct 12 '22 18:10

Rick James

Related questions
                            
                                Is it faster to connect/use MySQL at localhost instead of a domain (even if the domain resolves to the same computer)?
                            
                                How do you send NULL as a variable from PHP to SQL?
                            
                                Search matrix for all rectangles of given dimensions (select blocks of seats)
                            
                                utf8 encoding in Perl and MySql
                            
                                MySQL InnoDB SELECT...LIMIT 1 FOR UPDATE Vs UPDATE ... LIMIT 1
                            
                                Selecting table data with PDO statements [duplicate]
                            
                                MySQL BETWEEN without endpoints
                            
                                What is the difference between .save and .create in Sequelizejs?
                            
                                Difference between `brew services start mysql` and `mysql.server start`
                            
                                How to store a java.util.Date into a MySQL timestamp field in the UTC/GMT timezone?
                            
                                How can I use ADO.NET DbProviderFactory with MySQL?
                            
                                How to do authentication using SOAP?
                            
                                What's the Option=N in the MySQL ODBC connection string?
                            
                                How to concatenate data from one field, in a comma-delimited list, in a many-to-many relationship in MySQL?
                            
                                How to create tables with password fields in mysql?
                            
                                Select grouping where all the elements meet the condition
                            
                                Get last deleted ID in MySQL
                            
                                mysql case sensitive in utf8_general_ci
                            
                                SELECT MySQL field that contains a substring [duplicate]
                            
                                2 foreign keys referencing same table

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

MySQL collation: utf8mb4_unicode_ci vs "utf8mb4 - default collation"

Tags:

mysql

collation

mysql-workbench

Yevgeniy Afanasyev

People also ask

1 Answers

Rick James

Recent Activity

Donate For Us