Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

What collation to use for SQL Server database?

Tags:

which collation do I need to choose? SQL Server 2008

I've found a nice and related post on stackoverflow.com regarding this question: How to choose collation of SQL Server database

So If I understand well (ref above link):

  • collation is used for sorting and comparing):
  • NVARCHAR is use for store data.

collation properties/parms

  • CI specifies case-insensitive
  • CS specifies case-sensitive
  • AI specifies accent-insensitive
  • AS specifies accent-sensitive

I need to create a database and will store Turkish and English, I'll choose CI and AI. I don't want case sensitive and no accent sensitive, so it's easy. I think this is clear for English, but Turkish has some special characters like üçö etc.

Question:

Since collation is not related to STORING data and I'll use NVARCHAR, why should I choose collation Turkish_100_CI_AI, I can also use Latin1_General_100_CI_AI, which is also my default on my SQL Server. Both are Latin script.

It's the same question for storing ENGLISH and FRENCH in the same database... Why should I use French_100_CI_AI in stead of Latin1_General_100_CI_AI?

Can someone advice ? Am I wrong?

like image 390
ethem Avatar asked Jul 29 '11 10:07

ethem


People also ask

What is database collation in SQL Server?

Collations in SQL Server provide sorting rules, case, and accent sensitivity properties for your data. Collations that are used with character data types, such as char and varchar, dictate the code page and corresponding characters that can be represented for that data type.

Is SQL_Latin1_General_CP1_CI_AS the same as Latin1_General_CI_AS?

The SQL_Latin1_General_CP1_CI_AS collation is a SQL collation and the rules around sorting data for unicode and non-unicode data are different. The Latin1_General_CI_AS collation is a Windows collation and the rules around sorting unicode and non-unicode data are the same.

What is the use of collate SQL_Latin1_General_CP1_CI_AS?

The collate clause is used for case sensitive and case insensitive searches in the columns of the SQL server. There are two types of collate clause present: SQL_Latin1_General_CP1_CS_AS for case sensitive. SQL_Latin1_General_CP1_CI_AS for case insensitive.

What is the default collation of SQL Server engine?

Server-level collation for Microsoft SQL Server If you don't choose a different collation, the server-level collation defaults to SQL_Latin1_General_CP1_CI_AS. The server collation is applied by default to all databases and database objects. You can't change the collation when you restore from a DB snapshot.


2 Answers

You can set the collation explicitly for each column using the COLLATE clause, if your data model allows you to separate data into language-specific columns.

You can also apply the COLLATE clause to a SELECT statement (e.g. you keep all language data in the same place, and only filter by language in a SELECT).

As far as I'm aware Turkish (sort order) is not covered by Latin1.

like image 150
devio Avatar answered Sep 21 '22 08:09

devio


Collation refers to a set of rules that determine how data is sorted and compared. Character data is sorted using rules that define the correct character sequence, with options for specifying case-sensitivity, accent marks, kana character types and character width.

Case sensitivity

If A and a, B and b, etc. are treated in the same way then it is case-insensitive. A computer treats A and a differently because it uses ASCII code to differentiate the input.

Accent sensitivity

If a and á, o and ó are treated in the same way, then it is accent-insensitive. A computer treats a and á differently because it uses ASCII code for differentiating the input. Like The ASCII value of a is 97 and á is 225.

Kana Sensitivity

When Japanese kana characters Hiragana and Katakana are treated differently, it is called Kana sensitive.

Width sensitivity

When a single-byte character (half-width) and the same character when represented as a double-byte character (full-width) are treated differently then it is width sensitive.

More info can be found here. I hope this answer helped.

like image 26
ShaileshDev Avatar answered Sep 18 '22 08:09

ShaileshDev