Does using ASCII/Latin Charset speed up the database?

Tags:

It would seem that using the ASCII charset for most fields and then specify utf8 only for the fields that need it would reduce the amount of I/O the database must perform by 100%.

Anyone know if this is true?

Update: The above was not really my question. I should have said: use Latin for the default character set and then only specify utf8mb4 only for the fields that need it. The thinking being that: using 1 byte vs 2 bytes should improve I/O by 100%. Sorry for the confusion.

809

asked Jul 23 '18 23:07

mbalsam

2 Answers

Short Answer: Not worth worrying about.

Long Answer:

Two issues:

Speed:

Comparing two encodings with the corresponding _bin (ascii_bin or utf8_bin) COLLATION is as simple as comparing the bytes -- so no significant difference. Other collations can differ, with ascii being faster. But the difference is insignificant compared to the effort of fetching rows, etc.

Space:

Ascii is a subset of utf8. utf8 stores only 1 byte for each ascii character, just as ascii does. So, no space difference. (Accented letters in Western Europe need either 1-byte latin1 or 2-byte utf8; hence incompatible and different in size.) Space leads to caching, which leads to a slight difference in performance.

For English text, 0% savings. For European, latin1 would save only a few percent; For most the rest of the world, utf8 are the only viable solution. For Chinese and Emoji, utf8mb4 is a must.

Temp tables

In certain situations, the space consumed by a string expands to the potential max. country_code CHAR(2) CHARACTER SET ... will take 2 bytes for ascii; 6 bytes for utf8.

Bottom Line:

Use ascii for country codes, hex, postal codes, uuids, md5s etc. If you are going international, and/or need Emoji, then make your "strings" utf8mb4. But do it because it is 'right', not because you will get magically marvelously much more speed; you won't. And do it whenever you create a table; it's the pits to change it later.

181

answered Nov 22 '22 23:11

Rick James

@RickJames is right, you should not worry about saving space by choosing ASCII or utf8 over utf8mb4.

utf8 and utf8mb4 are variable-length character encodings. This table from wikipedia illustrates how characters automatically take 1, 2, 3, or 4 bytes each, depending on the value encoded. If the high bit of a byte is set, then the character uses an additional byte, up to 4 bytes.

enter image description here The wikipedia article explains it clearly:

The first 128 characters (US-ASCII) need one byte. The next 1,920 characters need two bytes to encode, which covers the remainder of almost all Latin-script alphabets, and also Greek, Cyrillic, Coptic, Armenian, Hebrew, Arabic, Syriac, Thaana and N'Ko alphabets, as well as Combining Diacritical Marks. Three bytes are needed for characters in the rest of the Basic Multilingual Plane, which contains virtually all characters in common use including most Chinese, Japanese and Korean characters. Four bytes are needed for characters in the other planes of Unicode, which include less common CJK characters, various historic scripts, mathematical symbols, and emoji (pictographic symbols).

You don't have to do anything to choose single-byte versus multi-byte mode. This is just the way the encoding works. Each character automatically uses the number of bytes it needs, and no more.

So there is no advantage to using utf8 over utf8mb4, and no advantage of using ASCII over either, unless you need to restrict the characters allowed in a string.

For what it's worth, the character set MySQL calls "utf8" is an alias for utf8mb3, an implementation of just the first three bytes of the UTF8 encoding. The MySQL server team blog (https://mysqlserverteam.com/mysql-8-0-when-to-use-utf8mb3-over-utf8mb4/) says that utf8mb4 is faster, at least given performance improvements in MySQL 8.0, and utf8mb3 should be considered deprecated. MySQL 8.0.11 release notes say that utf8 will be redefined as an alias for utf8mb4 in some future version of MySQL.

answered Nov 22 '22 23:11

Bill Karwin

Related questions
                            
                                PHP changing to mysqli. Is the mysqli_connection not global?
                            
                                Bad Sql Grammar exception in JDBC spring
                            
                                Possible to have PHP MYSQL query ignore empty variable in WHERE clause?
                            
                                MySQL Workbench - Is Schema the same thing that Database? [closed]
                            
                                Using reserved words in column names
                            
                                how to add date and time with backupfile name using mysqldump from command prompt and to define the path of backupfile
                            
                                Updating multiple rows of single table
                            
                                Difference between two table structure
                            
                                How can I retrieve instance of last added item
                            
                                Upload CSV file to MySQL using Laravel
                            
                                nodejs npm mysql return single row handle
                            
                                Updating error in sequelize
                            
                                Using IFNULL in sqlalchemy core
                            
                                MySQL - how to add "Using join buffer (Block Nested Loop)" to a query?
                            
                                checking checkboxList in Yii2 at the time of updating post?
                            
                                max_input_vars in php.ini not updating after change
                            
                                COUNT(*) with LEFT JOIN and GROUP BY to include NULL in MySQL
                            
                                SQL Server : how to avoid duplicate data?
                            
                                Import file in MAMP(file size exceeded the maximum size permitted)
                            
                                Set AUTO_INCREMENT value through variable in MySql

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Does using ASCII/Latin Charset speed up the database?

Tags:

mysql

character-set

mariadb

utf8mb4

mbalsam

People also ask

2 Answers

Rick James

Bill Karwin

Recent Activity

Donate For Us