Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Mysql unique index doesn't work on a certain umlaut

I have a users table in which there's a column called 'nickname', utf-8 encoded, varchar(20), the table is in InnoDB. There're 2 records one has a nickname = 'gunni' and the other nickname = 'günni'. When I tried to apply a unique index onto this column, mysql gave me this error :

ERROR 1062 (23000) at line 263: Duplicate entry 'gunni' for key 2

I checked the data there's only one record that has the name 'gunni', and if I change the 'günni' record to something else then apply the unique index again, everything works fine.

How come 'günni' & 'gunni' be duplicates? Here is the hex values for them, I get this with mysql's hex() function :

gunni -> 67756E6E69

günni -> 67C3BC6E6E69

They're obviously different. How come mysql treats these 2 as the same? Or is there something I don't know about unique indexes? Or even, could this be a mysql bug?

like image 459
Shawn Avatar asked Jul 26 '10 05:07

Shawn


People also ask

Does MySQL index have to be unique?

If you feel like your data should be UNIQUE , use a unique index. You may think it's optional (for instance, working it out at application level) and that a normal index will do, but it actually represents a guarantee for Mysql that each row is unique, which incidentally provides a performance benefit.

Are unique keys automatically indexed?

To answer to question in bold: Yes, making a field unique does index it like s primary key.

How does MySQL unique index work?

In MySQL, UNIQUE INDEX is used to define multiple non-duplicate columns at once. While PRIMARY KEY constraint also assures non-duplicate values in a column, only one PRIMARY KEY can be defined per table. So for scenarios where multiple columns are to be made distinct, UNIQUE INDEX is used.

Can indexes not be unique?

Non-unique indexes are not used to enforce constraints on the tables with which they are associated. Instead, non-unique indexes are used solely to improve query performance by maintaining a sorted order of data values that are used frequently.


1 Answers

It's because of the collation you are using.

Anything that ends with _ci is case-insensitive (and also accent/umlaut insensitive). So yes, MySQL will consider "günni" and "gunni" the same thing, unless you change your collation.

Docs: http://dev.mysql.com/doc/refman/5.0/en/charset-table.html

like image 148
NullUserException Avatar answered Oct 08 '22 12:10

NullUserException