I'm confused about the MySQL Collations and its Encodings, People usualy uses one of these three collations: <ol> <li>utf8mb_bin</li> <li>utf8mb4_general_ci</li> <li>utf8mb4_unicode_520_ci</li> </ol> What I don't understand is when to use each of these collations when needed, Like for example, A table for names like this <pre class="prettyprint"><code>[id - name] </code></pre> It would only has names characters from different languages like french, german, latin. . . Do I use for such a table the collation of <code>utf8mb_bin</code> or stick with <code>utf8mb4_unicode_520_ci</code>, In other hand, A table for topics of a blog for example: <pre class="prettyprint"><code>[id - title - subject] </code></pre> Do I put all the columns collation to <code>utf8mb4_unicode_520_ci</code> or use: <code>utf8mb_bin</code> for <code>title</code> <code>utf8mb4_unicode_520_ci</code> for <code>subject</code> Since as I understood, the <code>utf8mb4_unicode_520_ci</code> has some emotes in it that would be used in blogs subjects, Or do I just ignore everything and just use <code>utf8mb4_unicode_520_ci</code> on all? But overall, What is the point of using these different collations? And How does they affect my results in <code>SELECT</code> queries? What I would like to know in berif is: What collation to be used for each of: <ol> <li>names</li> <li>titles</li> <li>subjects</li> <li>emails</li> <li>bios</li> <li>messages</li> <li>usernames</li> </ol>

You're confusing encoding and collation. The available characters are defined by the encoding (and only the encoding). Since UTF-8 is a Unicode-compatible encoding, you have all characters. The MySQL peculiarity is that its <code>utf8</code> encoding does not really implement UTF-8 but only a subset because it allocates 3 bytes per character and (as of today) some characters need 4 bytes. Thus <code>utf8mb4</code> was born. Collation is a set of rules that tell you how <code>WHERE foo = bar</code> and <code>ORDER BY foo</code> work. You need to ask yourself: if I search for <code>internet</code>, should it match <code>Internet</code>? If you store French, German and Latin words you most likely don't want a binary collation. Ideally you want one with the exact rules of the language you'll be using but, since you're mixing languages, you'll have to opt for a generic collation. You can make an informed decision after reading Difference between utf8mb4_unicode_ci and utf8mb4_unicode_520_ci collations.

When to use utf8mb4 (bin, general_ci, unicode_520_ci)? [duplicate]

Tags:

mysql

character-encoding

collation

utf8mb4

I'm confused about the MySQL Collations and its Encodings, People usualy uses one of these three collations:

utf8mb_bin
utf8mb4_general_ci
utf8mb4_unicode_520_ci

What I don't understand is when to use each of these collations when needed, Like for example,

A table for names like this

[id - name]

It would only has names characters from different languages like french, german, latin. . .

Do I use for such a table the collation of utf8mb_bin or stick with utf8mb4_unicode_520_ci,

In other hand, A table for topics of a blog for example:

[id - title - subject]

Do I put all the columns collation to utf8mb4_unicode_520_ci or use:

utf8mb_bin for title

utf8mb4_unicode_520_ci for subject

Since as I understood, the utf8mb4_unicode_520_ci has some emotes in it that would be used in blogs subjects, Or do I just ignore everything and just use utf8mb4_unicode_520_ci on all?

But overall, What is the point of using these different collations? And How does they affect my results in SELECT queries?

What I would like to know in berif is:

What collation to be used for each of:

names
titles
subjects
emails
bios
messages
usernames

381

asked Jul 19 '18 16:07

Toleo

1 Answers

You're confusing encoding and collation.

The available characters are defined by the encoding (and only the encoding). Since UTF-8 is a Unicode-compatible encoding, you have all characters. The MySQL peculiarity is that its utf8 encoding does not really implement UTF-8 but only a subset because it allocates 3 bytes per character and (as of today) some characters need 4 bytes. Thus utf8mb4 was born.

Collation is a set of rules that tell you how WHERE foo = bar and ORDER BY foo work. You need to ask yourself: if I search for internet, should it match Internet? If you store French, German and Latin words you most likely don't want a binary collation. Ideally you want one with the exact rules of the language you'll be using but, since you're mixing languages, you'll have to opt for a generic collation. You can make an informed decision after reading Difference between utf8mb4_unicode_ci and utf8mb4_unicode_520_ci collations.

170

answered Oct 06 '22 23:10

Álvaro González

Related questions
                            
                                Mysql returning OK but with no results
                            
                                MySQL select last record and update it
                            
                                MySQL how to get correct count of all desired fields in table
                            
                                Prevent InnoDB auto increment ON DUPLICATE KEY
                            
                                IF statement is not valid in this position
                            
                                MySQL SELECT return wrong results
                            
                                Cakephp 3 giving date and time fields in frozentime object
                            
                                Go MySql driver doesn't set time correctly
                            
                                Django Mysql Database returned an invalid datetime value
                            
                                mysql_connect in php 5.6 + [duplicate]
                            
                                Pass password to mysql_config_editor using variable in shell
                            
                                "PDO exception: php_network_getaddresses: getaddrinfo failed" after changing DNS resolvers, with DB running on AWS
                            
                                PDOException: SQLSTATE[HY000] [2002] php_network_getaddresses: getaddrinfo failed: Name or service not known
                            
                                How can I get latitude & longitude by using address from google geocode API in Configure.IT API flow
                            
                                Most common values for a group dependent on a select query
                            
                                Golang db.Query with sql join
                            
                                Insert a new row using JpaRepository in java
                            
                                MySQL stored procedure no insert ID returned?
                            
                                docker-compose mysql import failed - the input device is not a TTY
                            
                                Why did changing from utf8 to utf8mb4 slow down my database?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With