Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

MySQL collation for all languages

I'm currently developing a website that is going to show stuff for almost any language in the world. And I'm having problems choosing the best collation to define in the MySQL.

Which one is the best to support all characters? Or the most accurate?

Or is just best to convert all characters to unicode?

like image 459
Pedro Luz Avatar asked Sep 20 '09 11:09

Pedro Luz


2 Answers

The accepted answer is wrong (maybe it was right in 2009).

utf8mb4_unicode_ci is the best encoding to use for wide language support.

Reasoning and supporting evidence:

You want to use utf8mb4 rather than utf8 because the latter only supports 3 byte characters, and you want to support 4 byte characters. (ref)

and

You want to use unicode rather than general because the latter never sorted correctly. (ref)

like image 96
Gerbus Avatar answered Sep 29 '22 23:09

Gerbus


I generally use 8-bit UCS/Unicode transformation format which works perfect for any (well most) languages

utf8_general_ci

http://dev.mysql.com/doc/refman/5.0/en/charset-unicode.html

like image 39
stone Avatar answered Sep 29 '22 23:09

stone