Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

What is the difference between utf8mb4 and utf8 charsets in MySQL?

What is the difference between utf8mb4 and utf8 charsets in MySQL?

I already know about ASCII, UTF-8, UTF-16 and UTF-32 encodings; but I'm curious to know whats the difference of utf8mb4 group of encodings with other encoding types defined in MySQL Server.

Are there any special benefits/proposes of using utf8mb4 rather than utf8?

like image 823
Mojtaba Rezaeian Avatar asked May 06 '15 10:05

Mojtaba Rezaeian


People also ask

Should I use UTF-8 or utf8mb4?

MySQL supports multiple Unicode character sets: utf8mb4 : A UTF-8 encoding of the Unicode character set using one to four bytes per character. utf8mb3 : A UTF-8 encoding of the Unicode character set using one to three bytes per character. This character set is deprecated in MySQL 8.0, and you should use utfmb4 instead.

Is utf8mb4 backwards compatible with UTF-8?

My recommendation is to convert all tables to utf8mb4 for full UTF-8 support. Also, utf8mb4 is backwards compatible with utf8 .

What is the difference between UTF-8 and utf8mb3?

UTF-8 is a variable-length encoding. In the case of UTF-8, this means that storing one code point requires one to four bytes. However, MySQL's encoding called "utf8" (alias of "utf8mb3") only stores a maximum of three bytes per code point.

How do I change utf8mb4 to UTF-8?

To solve the problem open the exported SQL file, search and replace the utf8mb4 with utf8 , after that search and replace the utf8mb4_unicode_520_ci with utf8_general_ci . Save the file and import it into your database. After that, change the wp-config. php charset option to utf8 , and the magic starts.

What is utf8mb4 encoding in MySQL?

Few years later, when MySQL 5.5.3 was released, they introduced a new encoding called utf8mb4, which is actually the real 4-byte utf8 encoding that you know and love. if you're using MySQL (or MariaDB or Percona Server), make sure you know your encodings.

How many bytes per character does the character set utf8utf8mb3 contain?

The character set named utf8utf8mb3] uses a maximum of three bytes per character and contains only BMP characters. As of MySQL 5.5.3, the utf8mb4 character set uses a maximum of four bytes per character supports supplemental characters:

How many bytes does a character set use in MySQL?

As of MySQL 5.5.3, the utf8mb4 character set uses a maximum of four bytes per character supports supplemental characters: For a BMP character, utf8utf8mb3] and utf8mb4 have identical storage characteristics: same code values, same encoding, same length.

Why doesn't utf8mb3 support Unicode code points?

So the character set "utf8"/"utf8mb3" cannot store all Unicode code points: it only supports the range 0x000 to 0xFFFF, which is called the " Basic Multilingual Plane ". See also Comparison of Unicode encodings.


Video Answer


2 Answers

UTF-8 is a variable-length encoding. In the case of UTF-8, this means that storing one code point requires one to four bytes. However, MySQL's encoding called "utf8" (alias of "utf8mb3") only stores a maximum of three bytes per code point.

So the character set "utf8"/"utf8mb3" cannot store all Unicode code points: it only supports the range 0x000 to 0xFFFF, which is called the "Basic Multilingual Plane". See also Comparison of Unicode encodings.

This is what (a previous version of the same page at) the MySQL documentation has to say about it:

The character set named utf8[/utf8mb3] uses a maximum of three bytes per character and contains only BMP characters. As of MySQL 5.5.3, the utf8mb4 character set uses a maximum of four bytes per character supports supplemental characters:

  • For a BMP character, utf8[/utf8mb3] and utf8mb4 have identical storage characteristics: same code values, same encoding, same length.

  • For a supplementary character, utf8[/utf8mb3] cannot store the character at all, while utf8mb4 requires four bytes to store it. Since utf8[/utf8mb3] cannot store the character at all, you do not have any supplementary characters in utf8[/utf8mb3] columns and you need not worry about converting characters or losing data when upgrading utf8[/utf8mb3] data from older versions of MySQL.

So if you want your column to support storing characters lying outside the BMP (and you usually want to), such as emoji, use "utf8mb4". See also What are the most common non-BMP Unicode characters in actual use?.

like image 72
CodeCaster Avatar answered Sep 25 '22 22:09

CodeCaster


The utf8mb4 character set is useful because nowadays we need support for storing not only language characters but also symbols, newly introduced emojis, and so on.

A nice read on How to support full Unicode in MySQL databases by Mathias Bynens can also shed some light on this.

like image 23
Jimmy Kane Avatar answered Sep 26 '22 22:09

Jimmy Kane