I'm confused about the MySQL Collations and its Encodings, People usualy uses one of these three collations:
What I don't understand is when to use each of these collations when needed, Like for example,
A table for names like this
[id - name]
It would only has names characters from different languages like french, german, latin. . .
Do I use for such a table the collation of utf8mb_bin
or stick with utf8mb4_unicode_520_ci
,
In other hand, A table for topics of a blog for example:
[id - title - subject]
Do I put all the columns collation to utf8mb4_unicode_520_ci
or use:
utf8mb_bin
for title
utf8mb4_unicode_520_ci
for subject
Since as I understood, the utf8mb4_unicode_520_ci
has some emotes in it that would be used in blogs subjects, Or do I just ignore everything and just use utf8mb4_unicode_520_ci
on all?
But overall, What is the point of using these different collations? And How does they affect my results in SELECT
queries?
What I would like to know in berif is:
What collation to be used for each of:
If you need a database, don't use MySQL or MariaDB. Use PostgreSQL. If you need to use MySQL or MariaDB, never use “utf8”. Always use “utf8mb4” when you want UTF-8.
Each character set has a default collation. For example, the default collations for utf8mb4 and latin1 are utf8mb4_0900_ai_ci and latin1_swedish_ci , respectively.
The utfmb4 character set has these characteristics: Supports BMP and supplementary characters. Requires a maximum of four bytes per multibyte character.
utf8mb4 : A UTF-8 encoding of the Unicode character set using one to four bytes per character. utf8mb3 : A UTF-8 encoding of the Unicode character set using one to three bytes per character. This character set is deprecated in MySQL 8.0, and you should use utfmb4 instead.
You're confusing encoding and collation.
The available characters are defined by the encoding (and only the encoding). Since UTF-8 is a Unicode-compatible encoding, you have all characters. The MySQL peculiarity is that its utf8
encoding does not really implement UTF-8 but only a subset because it allocates 3 bytes per character and (as of today) some characters need 4 bytes. Thus utf8mb4
was born.
Collation is a set of rules that tell you how WHERE foo = bar
and ORDER BY foo
work. You need to ask yourself: if I search for internet
, should it match Internet
? If you store French, German and Latin words you most likely don't want a binary collation. Ideally you want one with the exact rules of the language you'll be using but, since you're mixing languages, you'll have to opt for a generic collation. You can make an informed decision after reading Difference between utf8mb4_unicode_ci and utf8mb4_unicode_520_ci collations.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With