As far as I understand, in MySQL unicode_ci (utf8_unicode_ci in particular) collations are meant to support all the characters regardless to locale.
I need to achieve the same with SQL Server 2008 R2. My database is going to contain data in very different languages (not limited to latin-based alphabets). I am not going to use non-Unicode strings at all. What collation should I chose?
You might as well go with Latin1_General_CI_AI
The reason is that unicode data is stored using NVarchar fields, SQL Server is more flexible in that it can mix Varchar (1-byte) and NVarchar (2-byte) data. So to match UTF8, any collation would do. As for CI - every single collation in 2008 allows for the CI specification to be added (it is a checkbox in the UI "case sensitive" - unchecked for insensitive).
The last bit and some others like width are just additional tuning on SQL Server.
Point #2 from http://forums.mysql.com/read.php?103,187048,188748
utf8_unicode_ci is fine for all these languages: Russian, Bulgarian, Belarusian, Macedonian, Serbian, and Ukrainian.
If you require sorting for a particular language, where languages handle accents differently, you need a specific dictionary order - refer here http://msdn.microsoft.com/en-us/library/ms144250.aspx. Otherwise Latin1_General is based on Latin-US
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With