What is the difference between <code>utf8mb4_0900_ai_ci</code> and <code>utf8_unicode_ci</code> database text coding in mysql (especially in terms of performance) ? Update: There are similar differences between utf8mb4_unicode_ci and utf8mb4_0900_ai_ci?

<ul> <li>The encoding is the same. That is, the bytes look the same.</li> <li>The character set is different. utf8mb4 has more characters.</li> <li>The collation (how comparisions are done) is different.</li> <li>The perfomance is different, but it rarely matters.</li> </ul> <code>utf8_unicode_ci</code> implies the <code>CHARACTER SET utf8</code>, which includes only the 1-, 2-, and 3-byte UTF-8 characters. Hence it excludes most Emoji and some Chinese characters. <code>utf8mb4_unicode_ci</code> implies the <code>CHARACTER SET utf8mb4</code> is the corresponding <code>COLLATION</code> for the 4-byte <code>CHARACTER SET utf8mb4</code>. The Unicode organization has been evolving the specification over the years. Here are the mappings from its "versions" to MySQL Collations: <pre class="prettyprint"><code>4.0 _unicode_ 5.20 _unicode_520_ 9.0 _0900_ </code></pre> Most of the differences will be in areas that most people never encounter. One example: At some point, a change allowed Emoji to be distinguished and ordered in some manner. The suffix (MySQL doc): <pre class="prettyprint"><code>_bin -- just compare the bits; don't consider case folding, accents, etc _ci -- explicitly case insensitive (A=a) and implicitly accent insensitive (a=á) _ai_ci -- explicitly case insensitive and accent insensitive _as (etc) -- accent-sensitive (etc) </code></pre> Performance: <pre class="prettyprint"><code>_bin -- simple, fast _general_ci -- fails to compare multiple letters; eg ss=ß, so somewhat fast ... -- slower _900_ -- (8.0) much faster because of a rewrite </code></pre> However: The speed of collation is usually the least of the performance issues in queries. <code>INDEXes</code>, <code>JOINs</code>, subqueries, table scans, etc are much more critical to performance.

What's the difference between utf8_unicode_ci and utf8mb4_0900_ai_ci

1 Answers

The encoding is the same. That is, the bytes look the same.
The character set is different. utf8mb4 has more characters.
The collation (how comparisions are done) is different.
The perfomance is different, but it rarely matters.

utf8_unicode_ci implies the CHARACTER SET utf8, which includes only the 1-, 2-, and 3-byte UTF-8 characters. Hence it excludes most Emoji and some Chinese characters.

utf8mb4_unicode_ci implies the CHARACTER SET utf8mb4 is the corresponding COLLATION for the 4-byte CHARACTER SET utf8mb4.

The Unicode organization has been evolving the specification over the years. Here are the mappings from its "versions" to MySQL Collations:

4.0   _unicode_
5.20  _unicode_520_
9.0   _0900_

Most of the differences will be in areas that most people never encounter. One example: At some point, a change allowed Emoji to be distinguished and ordered in some manner.

The suffix (MySQL doc):

_bin      -- just compare the bits; don't consider case folding, accents, etc
_ci       -- explicitly case insensitive (A=a) and implicitly accent insensitive (a=á)
_ai_ci    -- explicitly case insensitive and accent insensitive
_as (etc) -- accent-sensitive (etc)

Performance:

_bin         -- simple, fast
_general_ci  -- fails to compare multiple letters; eg ss=ß, so somewhat fast
...          -- slower
_900_        -- (8.0) much faster because of a rewrite

However: The speed of collation is usually the least of the performance issues in queries. INDEXes, JOINs, subqueries, table scans, etc are much more critical to performance.

143

answered Oct 09 '22 21:10

Rick James

Related questions
                            
                                java.sql.SQLException: Field 'supplier_id' doesn't have a default value
                            
                                check current user in mysql command line
                            
                                How to create and insert a JSON object using MySQL queries?
                            
                                How can I see raw bytes stored in a MySQL column?
                            
                                multiple values in mysql variable
                            
                                How do I cast a type to a bigint in MySQL?
                            
                                How costly are JOINs in SQL? And/or, what's the trade off between performance and normalization?
                            
                                How to make SQL query more readable in PHP?
                            
                                Incorrect format parameter
                            
                                getting Lost connection to mysql when using mysqldump even with max_allowed_packet parameter
                            
                                Simple Math max function in MySQL
                            
                                What are advantages of using a one-to-one table relationship? (MySQL)
                            
                                Populate a Drop down box from a mySQL table in PHP
                            
                                MySQL Create Table as SELECT
                            
                                Undo a mysql UPDATE command
                            
                                Getting MySQL path in command prompt
                            
                                Mysql DATE_SUB(NOW(), INTERVAL 1 DAY) 24 hours or weekday?
                            
                                Group Concat Results Cut Off
                            
                                Exception while trying to run java program from maven
                            
                                Apply "ORDER BY" on a "UNION" (Mysql)

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

What's the difference between utf8_unicode_ci and utf8mb4_0900_ai_ci

Tags:

mysql

unicode

Kamil Kiełczewski

People also ask

1 Answers

Rick James

Recent Activity

Donate For Us