Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

MySQL Case Insensitive but Accent Sensitive UTF8 Unique Key

I know there are a lot of similar questions on SF, but I think mine is different enough to warrant a new question. I have a table that has a single column as utf8 with utf8_unicode_ci. It also has a unique key on this column along with another column marking a language code. The data in the column is in many different scripts (Latin with various accents, Chinese, and Russian, among others).

The problem is that I will sometimes want to enter in two words with different meanings that only differ by a diacritic (i.e. Spanish ano vs año). Since utf8_unicode_ci is both case and accent insensitive, it thinks these are the same and will only let me enter one. That sucks. Ideally, I'd just switch the whole column over to some collation that's case INsensitive but accent sensitive, but that doesn't seem to exist. A lot of different stuff uses this column, so I'd rather not change the column's default collation to utf8_bin for fear of messing stuff up with case sensitivity.

So, all of that said, I need some solution that will not affect default case sensitivity in the many existing queries that hit this column, but will let me add in words that differ by only a diacritic. Ideas? I'll switch just the unique key constraint to utf8_bin if I have to, but I'd rather not as I never want two things in the table that differ only by case.

like image 286
Eli Avatar asked Jun 12 '12 22:06

Eli


People also ask

Is MySQL select case sensitive?

By default, it depends on the operating system and its case sensitivity. This means MySQL is case-insensitive in Windows and macOS, while it is case-sensitive in most Linux systems. However, you can change the behavior by changing collation.

What is accent sensitive?

Requiring or making a distinction between accented and unaccented forms of a letter and between different accents.

What is utf8_general_ci?

utf8_general_ci is a legacy collation that does not support expansions, contractions, or ignorable characters. It can make only one-to-one comparisons between characters.


1 Answers

You don't have to reinvent the flat tire (reinvent the wheel) here.

There are two Spanish-language collations in MySQL:

utf8_spanish_ci (modern Spanish) and utf8_spanish2_ci (traditional Spanish)

These know the language well enough to know that N and n should be collated together, but that Ñ and ñ are different letters between N and O. In Spanish, the Ñ is actually a different letter, not an accent.

Set your column collation to utf8_spanish_ci and all will work the way you want.

like image 67
O. Jones Avatar answered Sep 28 '22 16:09

O. Jones