utf8_bin vs. utf_unicode_ci

People also ask

What is utf8_bin?

The utf8_bin collation compares strings based purely on their Unicode code point values. If all of the code points have the same values, then the strings are equal. However, this falls apart when you have strings with different composition for combining marks (composed vs.

What's the difference between utf8_general_ci and utf8_unicode_ci?

In short: utf8_unicode_ci uses the Unicode Collation Algorithm as defined in the Unicode standards, whereas utf8_general_ci is a more simple sort order which results in "less accurate" sorting results.

What is utf8_unicode_ci?

utf8_unicode_ci uses the standard Unicode Collation Algorithm, supports so called expansions and ligatures, for example: German letter ß (U+00DF LETTER SHARP S) is sorted near "ss" Letter Œ (U+0152 LATIN CAPITAL LIGATURE OE) is sorted near "OE".

What is utf8_general_ci?

utf8_general_ci is a legacy collation that does not support expansions, contractions, or ignorable characters. It can make only one-to-one comparisons between characters.

It depends on what you need.

The utf8_bin collation compares strings based purely on their Unicode code point values. If all of the code points have the same values, then the strings are equal. However, this falls apart when you have strings with different composition for combining marks (composed vs. decomposed) or characters that are canonically equivalent but don't have the same code point value. In some cases, using utf8_bin will result in strings not matching when you expect them to. Theoretically, utf8_bin is the fastest because no Unicode normalization is applied to the strings, but it may not be what you want.

utf8_general_ci applies Unicode normalization using language-specific rules and compares strings case-insensitively. utf8_general_cs does the same, but compares strings case-sensitively.

Personally I would go with utf8_unicode_ci, if you expect that lettercase is generally not important for the results you want to find.

Collations aren't only used at runtime, but also when MySQL builds indexes. So if any of these columns appear in an index, finding data according to the comparison rules of that collation will be pretty much as fast as it ever gets.

In those cases where you do not want case insensitive matching, then do not apply upper or lower. Instead, apply the BINARY keyword in front of the utf8 column to force a literal code-point comparison rather than one according to the collation.

mysql> create table utf8 (name varchar(24) charset utf8 collate utf8_general_ci, primary key (name));
Query OK, 0 rows affected (0.14 sec)

mysql> insert into utf8 values ('Roland');
Query OK, 1 row affected (0.00 sec)

mysql> insert into utf8 values ('roland');
ERROR 1062 (23000): Duplicate entry 'roland' for key 'PRIMARY'
mysql> select * from utf8 where name = 'roland';
+--------+
| name   |
+--------+
| Roland |
+--------+
1 row in set (0.00 sec)

mysql> select * from utf8 where binary name = 'roland';
Empty set (0.01 sec)

This should be much faster than using lower or upper, since in those cases, MySQL first needs to make a copy of the column value and modify its lettercase, and then apply the comparison. With BINARY in place it will simply use the index first to find matches, and then do a code-point by code-point comparison untill it finds the values are not equal, which will generally be faster.

I was using 'utf8_unicode_ci' which is default by doctrine , i had to change it to :

 * @ORM\Table(name = "Table", options={"collate"="utf8_bin"})

Since some of my composite primary keys consisted of text fields. Sadly 'utf8_unicode_ci' resolved "poistný" and "poistny" as same primary key value and ended with crash at doctrine inserting flush. I could not simply change collation of one part of composite primary key, had to drop table and recreate. Hope it saves time to someone else..

Related questions
                            
                                how to access the command line for xampp on windows
                            
                                How to swap values of two rows in MySQL without violating unique constraint?
                            
                                Select the 3 most recent records where the values of one column are distinct
                            
                                How to connect mySQL database using C++
                            
                                NodeJS/mySQL - ER_ACCESS_DENIED_ERROR Access denied for user 'root'@'localhost' (using password: YES)
                            
                                How to force case sensitive table names?
                            
                                Does Mysql have an equivalent to @@ROWCOUNT like in mssql?
                            
                                MySql - Is primary key unique by default?
                            
                                MySQL - SELECT * INTO OUTFILE LOCAL ?
                            
                                SELECT / GROUP BY - segments of time (10 seconds, 30 seconds, etc)
                            
                                How to escape underscore in the string query in hibernate and SQL?
                            
                                #1062 - Duplicate entry '' for key 'unique_id' When Trying to add UNIQUE KEY (MySQL)
                            
                                MySQL 5.6 DATETIME doesn't accept milliseconds/microseconds
                            
                                MySQL Trigger to prevent INSERT under certain conditions
                            
                                mysqldump table without dumping the primary key
                            
                                MySql sum elements of a column
                            
                                How to insert the current timestamp into MySQL database using a PHP insert query
                            
                                INSERT multiple records using ruby on rails active record
                            
                                Mysql Average on time column?
                            
                                Insert data into table with result from another select query

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

utf8_bin vs. utf_unicode_ci

Tags:

database

mysql

collation

relation

People also ask

Recent Activity

Donate For Us