Our column is currently collated to <code>latin1_swedish_ci</code> and special unicode characters are, obviously, getting stripped out. We want to be able to accept chars such as <code>U+272A ✪</code>, <code>U+2764 ❤</code>, (see this wikipedia article) etc. I'm leaning towards <code>utf8_unicode_ci</code>, would this collation handle these and other characters? I don't care about speed as this column isn't an index. MySQL Version: 5.5.28-1

The collation is the least of your worries, what you need to think about is the character set for the column/table/database. The collation (rules governing how data is compared and sorted) is just a corollary of that. MySQL supports several Unicode character sets, <code>utf8</code> and <code>utf8mb4</code> being the most interesting. <code>utf8</code> supports Unicode characters in the BMP, i.e. a subset of all of Unicode. <code>utf8mb4</code>, available since MySQL 5.5.3, supports all of Unicode. The collation to be used with any of the Unicode encodings is most likely <code>xxx_general_ci</code> or <code>xxx_unicode_ci</code>. The former is a general sorting and comparison algorithm independent of language, the latter is a more complete language independent algorithm supporting more Unicode features (e.g. treating "ß" and "ss" as equivalent), but is therefore also slower. See https://dev.mysql.com/doc/refman/5.5/en/charset-unicode-sets.html.

What MySQL collation is best for accepting all unicode characters?

Tags:

mysql

collation

Our column is currently collated to latin1_swedish_ci and special unicode characters are, obviously, getting stripped out. We want to be able to accept chars such as U+272A ✪, U+2764 ❤, (see this wikipedia article) etc. I'm leaning towards utf8_unicode_ci, would this collation handle these and other characters? I don't care about speed as this column isn't an index.

MySQL Version: 5.5.28-1

961

asked Jan 15 '13 00:01

HellaMad

1 Answers

The collation is the least of your worries, what you need to think about is the character set for the column/table/database. The collation (rules governing how data is compared and sorted) is just a corollary of that.

MySQL supports several Unicode character sets, utf8 and utf8mb4 being the most interesting. utf8 supports Unicode characters in the BMP, i.e. a subset of all of Unicode. utf8mb4, available since MySQL 5.5.3, supports all of Unicode.

The collation to be used with any of the Unicode encodings is most likely xxx_general_ci or xxx_unicode_ci. The former is a general sorting and comparison algorithm independent of language, the latter is a more complete language independent algorithm supporting more Unicode features (e.g. treating "ß" and "ss" as equivalent), but is therefore also slower.

See https://dev.mysql.com/doc/refman/5.5/en/charset-unicode-sets.html.

answered Oct 09 '22 11:10

deceze

Related questions
                            
                                How to insert a dataframe into a SQL Server table?
                            
                                Renaming git tags results in inconsistency
                            
                                Why does groupBy in Scala change the ordering of a list's items?
                            
                                MS Excel adds line break when copying a cell
                            
                                Automated command to generate composer.json?
                            
                                How to get the current model instance from inlineadmin in Django
                            
                                Redirecting stdout with find -exec and without creating new shell
                            
                                'NoneType' object has no attribute 'group'
                            
                                Custom layout for different cell sizes in UICollectionView
                            
                                Is doing Transaction Management in the Controller bad practice?
                            
                                django UnreadablePostError: request data read error
                            
                                Haskell library for HTTP communication

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With