UTF8 string comparisons in MySQL

Question

We have issues with utf8-string comparisons in MySQL 5, regarding case and accents :

from what I gathered, what MySQL implements collations by considering that "groups of characters should be considered equal".

For example, in the utf8_unicode_ci collation, all the letters "EÉÈÊeéèê" are in the same box (together with other variants of "e").

So if you have a table containing ["video", "vidéo", "vidÉo", "vidÊo", "vidêo", "vidÈo", "vidèo", "vidEo"] (in a varchar column declared with ut8_general_ci collation) :

when asking MySQL to sort the rows according to this column, the sorting is random (MySQL does not enforce a sorting rule between "é" and "É" for example),
when asking MySQL to add a Unique Key on this column, it raises an error because it considers all the values are equal.

What setting can we fiddle with to fix these two points ?

PS : on a related note, I do not see any case sensitive collation for the utf8 charset. Did I miss something ?

[edit] I think my initial question still holds some interest, and I will leave it as is (and maybe one day get a positive answer).

It turned out, however, that our problems with string comparisons regarding accents was not linked to the collation of our text columns. It was linked to a configuration problem with the character_set_client parameter when talking with MySQL - which defaulted to latin1.

Here is the article that explained it all to us, and allowed us to fix the problem :

Getting out of MySQL character set hell

It is lengthy, but trust me, you need this length to explain both the problem and the fix.

Mchl · Accepted Answer

Use collation that considers these characters to be distinct. Maybe utf8_bin (it's case sensitive, since it does binary comparison of characters)

http://dev.mysql.com/doc/refman/5.7/en/charset-unicode-sets.html

UTF8 string comparisons in MySQL

Tags:

mysql

case-sensitive

collation

LeGEC

1 Answers

Mchl

Recent Activity

Donate For Us

UTF8 string comparisons in MySQL

Tags:

mysql

case-sensitive

collation

LeGEC

1 Answers

Mchl

Related questions

Recent Activity

Donate For Us