Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

best default collation of a multilingual database

Tags:

sql-server

I am a bit confused about de default collations when creating a database. The data stored in the database will be in different languages. The main users of the database will be using the spanish language, but it will also be used in english, french... As the spanish default collation is Modern_Spanish_CI_AS, and the english, french italian.. defaults to Latin1_General_CI_AS, I would like to be advised on which collation to use, and if there are some drawbacks about using one collation or the other.

Many thanks for your help Regards

Javier

like image 447
javier Avatar asked Sep 06 '10 10:09

javier


People also ask

What is default database collation?

Default server-level collation is SQL_Latin1_General_CP1_CI_AS.

What is the best collation in SQL Server?

However here are the settings we typically recommend: Set the default collation to SQL_Latin1_General_CP1_CI_AS. Ensure that you have SQL Server running in Case Insensitive mode. We use NCHAR, NVARCHAR string types so all data is unicode, so no character set is specified.

Which is the best collation for MySQL?

If you're using MySQL 5.7, the default MySQL collation is generally latin1_swedish_ci because MySQL uses latin1 as its default character set. If you're using MySQL 8.0, the default charset is utf8mb4. If you elect to use UTF-8 as your collation, always use utf8mb4 (specifically utf8mb4_unicode_ci).

What does it mean by the As in the default collation SQL_Latin1_General_CP1_CI_AS?

If we apply a case sensitive clause to a column, then for example, 'a' and 'A', will be different. But in the case of case insensitive, irrespective of any character or string, it will work. By default, the collate clause will take SQL_Latin1_General_CP1_CI_AS (case insensitive).


1 Answers

A collation has two effects:

  1. For non-Unicode data types it determines the code page of the data, i.e. it determines which characters you can store in the column/variable or not
  2. For all data types, it affects how data is sorted and compared, i.e. ORDER BY and equality

To avoid problems with the first issue, always store and manipulate Unicode data using the nchar/nvarchar data types, because then you don't have to worry about the collation anyway. It requires more disk space, but it avoids some really awkward issues, so for most people it's probably a good tradeoff.

For the second issue, use the collation that makes the most sense for your database, i.e. which collation sorts and compares the data in the way that you want to do it most of the time? For example, if you know that case-sensitive comparisons will be important then Latin1_General_CS_AS might be a better choice.

And you can always use COLLATE to specify the collation explicitly if you need more control over specific queries:

create table #t (name nvarchar(100))

insert into #t select N'Che'
insert into #t select N'Carlos'
insert into #t select N'Cruz'

select name from #t order by name collate Modern_Spanish_CI_AS
select name from #t order by name collate Traditional_Spanish_CI_AS

drop table #t

If you don't know how text data will be sorted or compared and if your users don't know either, then I would just stay with your default collation (and use Unicode!); in the worst case, you can always move the data to a new table with the correct collation. And there's a lot of documentation on collations in Books Online that you should have a look into.

like image 196
Pondlife Avatar answered Oct 23 '22 05:10

Pondlife