Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

When to use utf8mb4 (bin, general_ci, unicode_520_ci)? [duplicate]

I'm confused about the MySQL Collations and its Encodings, People usualy uses one of these three collations:

  1. utf8mb_bin
  2. utf8mb4_general_ci
  3. utf8mb4_unicode_520_ci

What I don't understand is when to use each of these collations when needed, Like for example,

A table for names like this

[id - name]

It would only has names characters from different languages like french, german, latin. . .

Do I use for such a table the collation of utf8mb_bin or stick with utf8mb4_unicode_520_ci,

In other hand, A table for topics of a blog for example:

[id - title - subject]

Do I put all the columns collation to utf8mb4_unicode_520_ci or use:

utf8mb_bin for title

utf8mb4_unicode_520_ci for subject

Since as I understood, the utf8mb4_unicode_520_ci has some emotes in it that would be used in blogs subjects, Or do I just ignore everything and just use utf8mb4_unicode_520_ci on all?

But overall, What is the point of using these different collations? And How does they affect my results in SELECT queries?

What I would like to know in berif is:

What collation to be used for each of:

  1. names
  2. titles
  3. subjects
  4. emails
  5. bios
  6. messages
  7. usernames
like image 381
Toleo Avatar asked Jul 19 '18 16:07

Toleo


People also ask

Should I use utf8mb4 or UTF-8?

If you need a database, don't use MySQL or MariaDB. Use PostgreSQL. If you need to use MySQL or MariaDB, never use “utf8”. Always use “utf8mb4” when you want UTF-8.

What collation should I use for utf8mb4?

Each character set has a default collation. For example, the default collations for utf8mb4 and latin1 are utf8mb4_0900_ai_ci and latin1_swedish_ci , respectively.

What does the utf8mb4 character set allow for?

The utfmb4 character set has these characteristics: Supports BMP and supplementary characters. Requires a maximum of four bytes per multibyte character.

What is utf8mb4 in MySQL?

utf8mb4 : A UTF-8 encoding of the Unicode character set using one to four bytes per character. utf8mb3 : A UTF-8 encoding of the Unicode character set using one to three bytes per character. This character set is deprecated in MySQL 8.0, and you should use utfmb4 instead.


1 Answers

You're confusing encoding and collation.

The available characters are defined by the encoding (and only the encoding). Since UTF-8 is a Unicode-compatible encoding, you have all characters. The MySQL peculiarity is that its utf8 encoding does not really implement UTF-8 but only a subset because it allocates 3 bytes per character and (as of today) some characters need 4 bytes. Thus utf8mb4 was born.

Collation is a set of rules that tell you how WHERE foo = bar and ORDER BY foo work. You need to ask yourself: if I search for internet, should it match Internet? If you store French, German and Latin words you most likely don't want a binary collation. Ideally you want one with the exact rules of the language you'll be using but, since you're mixing languages, you'll have to opt for a generic collation. You can make an informed decision after reading Difference between utf8mb4_unicode_ci and utf8mb4_unicode_520_ci collations.

like image 170
Álvaro González Avatar answered Oct 06 '22 23:10

Álvaro González