I know there are a lot of similar questions on SF, but I think mine is different enough to warrant a new question. I have a table that has a single column as utf8 with utf8_unicode_ci. It also has a unique key on this column along with another column marking a language code. The data in the column is in many different scripts (Latin with various accents, Chinese, and Russian, among others).
The problem is that I will sometimes want to enter in two words with different meanings that only differ by a diacritic (i.e. Spanish ano vs año). Since utf8_unicode_ci is both case and accent insensitive, it thinks these are the same and will only let me enter one. That sucks. Ideally, I'd just switch the whole column over to some collation that's case INsensitive but accent sensitive, but that doesn't seem to exist. A lot of different stuff uses this column, so I'd rather not change the column's default collation to utf8_bin for fear of messing stuff up with case sensitivity.
So, all of that said, I need some solution that will not affect default case sensitivity in the many existing queries that hit this column, but will let me add in words that differ by only a diacritic. Ideas? I'll switch just the unique key constraint to utf8_bin if I have to, but I'd rather not as I never want two things in the table that differ only by case.
By default, it depends on the operating system and its case sensitivity. This means MySQL is case-insensitive in Windows and macOS, while it is case-sensitive in most Linux systems. However, you can change the behavior by changing collation.
Requiring or making a distinction between accented and unaccented forms of a letter and between different accents.
utf8_general_ci is a legacy collation that does not support expansions, contractions, or ignorable characters. It can make only one-to-one comparisons between characters.
You don't have to reinvent the flat tire (reinvent the wheel) here.
There are two Spanish-language collations in MySQL:
utf8_spanish_ci (modern Spanish) and utf8_spanish2_ci (traditional Spanish)
These know the language well enough to know that N and n should be collated together, but that Ñ and ñ are different letters between N and O. In Spanish, the Ñ is actually a different letter, not an accent.
Set your column collation to utf8_spanish_ci and all will work the way you want.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With