IVe read a lot about this. Still some questions : Im not talking about case sensitive here... <ul> <li>If I have a char (<code>ש</code> for example) and he is stored in <code>nvarchar</code> - which can hold anything , Why would I need <code>collation</code> here ?</li> <li>If I'm "FaceBook" and i need the ability to store <code>all</code> chars from <code>all</code> languages , What is the relationship between the collation and my nvarchar columns ? </li> </ul> Thanks in advance.

Storing and representing characters is one thing, and knowing how to sort and compare them is another. Unicode data, stored in the <code>XML</code> and <code>N</code>-prefixed types in SQL Server, can represent all characters in all languages (for the most part, and that is its goal) with a single character set. So for <code>NCHAR</code> / <code>NVARCHAR</code> data (I am leaving out <code>NTEXT</code> as it shouldn't be used anymore, and <code>XML</code> as it is not affected by Collations), the Collations do not change what characters can be stored. For <code>CHAR</code> and <code>VARCHAR</code> data, the Collations do affect what can be stored as each Collation points to a particular Code Page, which determines what can be stored in values 128 - 255. Now, while there is a default sort order for all characters, that cannot possibly work across all languages and cultures. There are many languages that share some / many / all characters, but have different rules for how to sort them. For example, the letter "C" comes before the letter "D" in most alphabets that use those letters. In US English, a combination of "C" and "H" (i.e. "CH" as two separate letters) would naturally come before any string starting with a "D". But, in a few languages, the two-letter combination of "CH" is special and sorts after "D": <pre class="prettyprint lang-sql prettyprint-override"><code>IF ( N'CH' COLLATE Czech_CI_AI > N'D' COLLATE Czech_CI_AI AND N'C' COLLATE Czech_CI_AI < N'D' COLLATE Czech_CI_AI AND N'CI' COLLATE Czech_CI_AI < N'D' COLLATE Czech_CI_AI ) PRINT 'Czech_CI_AI'; IF ( N'CH' COLLATE Czech_100_CI_AI > N'D' COLLATE Czech_100_CI_AI AND N'C' COLLATE Czech_100_CI_AI < N'D' COLLATE Czech_100_CI_AI AND N'CI' COLLATE Czech_100_CI_AI < N'D' COLLATE Czech_100_CI_AI ) PRINT 'Czech_100_CI_AI'; IF ( N'CH' COLLATE Slovak_CI_AI > N'D' COLLATE Slovak_CI_AI AND N'C' COLLATE Slovak_CI_AI < N'D' COLLATE Slovak_CI_AI AND N'CI' COLLATE Slovak_CI_AI < N'D' COLLATE Slovak_CI_AI ) PRINT 'Slovak_CI_AI'; IF ( N'CH' COLLATE Slovak_CS_AS > N'D' COLLATE Slovak_CS_AS AND N'C' COLLATE Slovak_CS_AS < N'D' COLLATE Slovak_CS_AS AND N'CI' COLLATE Slovak_CS_AS < N'D' COLLATE Slovak_CS_AS ) PRINT 'Slovak_CS_AS'; IF ( N'CH' COLLATE Latin1_General_100_CI_AS > N'D' COLLATE Latin1_General_100_CI_AS AND N'C' COLLATE Latin1_General_100_CI_AS < N'D' COLLATE Latin1_General_100_CI_AS AND N'CI' COLLATE Latin1_General_100_CI_AS < N'D' COLLATE Latin1_General_100_CI_AS ) PRINT 'Latin1_General_100_CI_AS' ELSE PRINT 'Nope!'; </code></pre> Returns: <pre class="prettyprint lang-none prettyprint-override"><code>Czech_CI_AI Czech_100_CI_AI Slovak_CI_AI Slovak_CS_AS Nope! </code></pre> To see examples of sorting rules across various cultures, please see: Collation Charts. Also, in some languages certain letters or combinations of letters equate to other letters in ways that they do not in most other languages. For example, only in Danish does a "å" equate to "aa". But, the "å" does not equate to just a single "a": <pre class="prettyprint lang-sql prettyprint-override"><code>IF (N'aa' COLLATE Danish_Greenlandic_100_CI_AI = N'å' COLLATE Danish_Greenlandic_100_CI_AI AND N'a' COLLATE Danish_Greenlandic_100_CI_AI <> N'å' COLLATE Danish_Greenlandic_100_CI_AI ) PRINT 'Danish_Greenlandic_100_CI_AI'; IF ( N'aa' COLLATE Danish_Norwegian_CI_AI = N'å' COLLATE Danish_Norwegian_CI_AI AND N'a' COLLATE Danish_Norwegian_CI_AI <> N'å' COLLATE Danish_Norwegian_CI_AI ) PRINT 'Danish_Norwegian_CI_AI'; IF ( N'aa' COLLATE Latin1_General_100_CI_AI = N'å' COLLATE Latin1_General_100_CI_AI AND N'a' COLLATE Latin1_General_100_CI_AI <> N'å' COLLATE Latin1_General_100_CI_AI ) PRINT 'Latin1_General_100_CI_AI' ELSE PRINT 'Nope!'; </code></pre> Returns: <pre class="prettyprint lang-none prettyprint-override"><code>Danish_Greenlandic_100_CI_AI Danish_Norwegian_CI_AI Nope! </code></pre> This is all highly complex, and I haven't even mentioned handling for right-to-left languages (Hebrew and Arabic), Chinese, Japanese, combining characters, etc. If you want some deep insight into the rules, check out the Unicode Collation Algorithm (UCA). The examples above are based on examples in that documentation, though I do not believe all of the rules in the UCA have been implemented, especially since the Windows collations (collations not starting with <code>SQL_</code>) are based on Unicode 5.0 or 6.0, depending on the which OS you are using and the version of the .NET Framework that is installed (see SortVersion for details). So that is what the Collations do. If you want to see all of the Collations that are available, just run the following: <pre class="prettyprint lang-sql prettyprint-override"><code>SELECT [name] FROM sys.fn_helpcollations() ORDER BY [name]; </code></pre>

What is the point of COLLATIONS for nvarchar (Unicode) columns?

1 Answers

Storing and representing characters is one thing, and knowing how to sort and compare them is another.

Unicode data, stored in the XML and N-prefixed types in SQL Server, can represent all characters in all languages (for the most part, and that is its goal) with a single character set. So for NCHAR / NVARCHAR data (I am leaving out NTEXT as it shouldn't be used anymore, and XML as it is not affected by Collations), the Collations do not change what characters can be stored. For CHAR and VARCHAR data, the Collations do affect what can be stored as each Collation points to a particular Code Page, which determines what can be stored in values 128 - 255.

Now, while there is a default sort order for all characters, that cannot possibly work across all languages and cultures. There are many languages that share some / many / all characters, but have different rules for how to sort them. For example, the letter "C" comes before the letter "D" in most alphabets that use those letters. In US English, a combination of "C" and "H" (i.e. "CH" as two separate letters) would naturally come before any string starting with a "D". But, in a few languages, the two-letter combination of "CH" is special and sorts after "D":

IF (   N'CH' COLLATE Czech_CI_AI > N'D' COLLATE Czech_CI_AI
   AND N'C'  COLLATE Czech_CI_AI < N'D' COLLATE Czech_CI_AI
   AND N'CI' COLLATE Czech_CI_AI < N'D' COLLATE Czech_CI_AI
   ) PRINT 'Czech_CI_AI';

IF (   N'CH' COLLATE Czech_100_CI_AI > N'D' COLLATE Czech_100_CI_AI
   AND N'C'  COLLATE Czech_100_CI_AI < N'D' COLLATE Czech_100_CI_AI
   AND N'CI' COLLATE Czech_100_CI_AI < N'D' COLLATE Czech_100_CI_AI
   ) PRINT 'Czech_100_CI_AI';

IF (   N'CH' COLLATE Slovak_CI_AI > N'D' COLLATE Slovak_CI_AI
   AND N'C'  COLLATE Slovak_CI_AI < N'D' COLLATE Slovak_CI_AI
   AND N'CI' COLLATE Slovak_CI_AI < N'D' COLLATE Slovak_CI_AI
   ) PRINT 'Slovak_CI_AI';

IF (   N'CH' COLLATE Slovak_CS_AS > N'D' COLLATE Slovak_CS_AS
   AND N'C'  COLLATE Slovak_CS_AS < N'D' COLLATE Slovak_CS_AS
   AND N'CI' COLLATE Slovak_CS_AS < N'D' COLLATE Slovak_CS_AS
   ) PRINT 'Slovak_CS_AS';

IF (   N'CH' COLLATE Latin1_General_100_CI_AS > N'D' COLLATE Latin1_General_100_CI_AS
   AND N'C'  COLLATE Latin1_General_100_CI_AS < N'D' COLLATE Latin1_General_100_CI_AS
   AND N'CI' COLLATE Latin1_General_100_CI_AS < N'D' COLLATE Latin1_General_100_CI_AS
   ) PRINT 'Latin1_General_100_CI_AS'
ELSE PRINT 'Nope!';

Returns:

Czech_CI_AI
Czech_100_CI_AI
Slovak_CI_AI
Slovak_CS_AS
Nope!

To see examples of sorting rules across various cultures, please see: Collation Charts.

Also, in some languages certain letters or combinations of letters equate to other letters in ways that they do not in most other languages. For example, only in Danish does a "å" equate to "aa". But, the "å" does not equate to just a single "a":

IF (N'aa' COLLATE Danish_Greenlandic_100_CI_AI =  N'å' COLLATE Danish_Greenlandic_100_CI_AI
AND N'a'  COLLATE Danish_Greenlandic_100_CI_AI <> N'å' COLLATE Danish_Greenlandic_100_CI_AI
   ) PRINT 'Danish_Greenlandic_100_CI_AI';

IF (   N'aa' COLLATE Danish_Norwegian_CI_AI =  N'å' COLLATE Danish_Norwegian_CI_AI
   AND N'a'  COLLATE Danish_Norwegian_CI_AI <> N'å' COLLATE Danish_Norwegian_CI_AI
   ) PRINT 'Danish_Norwegian_CI_AI';

IF (   N'aa' COLLATE Latin1_General_100_CI_AI =  N'å' COLLATE Latin1_General_100_CI_AI
   AND N'a'  COLLATE Latin1_General_100_CI_AI <> N'å' COLLATE Latin1_General_100_CI_AI
   ) PRINT 'Latin1_General_100_CI_AI'
ELSE PRINT 'Nope!';

Returns:

Danish_Greenlandic_100_CI_AI
Danish_Norwegian_CI_AI
Nope!

This is all highly complex, and I haven't even mentioned handling for right-to-left languages (Hebrew and Arabic), Chinese, Japanese, combining characters, etc.

If you want some deep insight into the rules, check out the Unicode Collation Algorithm (UCA). The examples above are based on examples in that documentation, though I do not believe all of the rules in the UCA have been implemented, especially since the Windows collations (collations not starting with SQL_) are based on Unicode 5.0 or 6.0, depending on the which OS you are using and the version of the .NET Framework that is installed (see SortVersion for details).

So that is what the Collations do. If you want to see all of the Collations that are available, just run the following:

SELECT [name] FROM sys.fn_helpcollations() ORDER BY [name];

answered Nov 02 '22 19:11

Solomon Rutzky

Related questions
                            
                                SQL Server, the misleading XLOCK & optimizations
                            
                                Get Results from XP_CMDSHELL
                            
                                How to add new column in existing View in SQL-Server 2014 using Alter
                            
                                SQL Constraint Validate Unique Values
                            
                                What Causes "Internal connection fatal errors"
                            
                                SQL Server - Management Studio - Client Statistics - Wait time on server replies vs Client processing time
                            
                                sql group by only rows which are in sequence
                            
                                SQL random sample with groups
                            
                                SQL Server linked server performance
                            
                                Filter based on an aliased column name
                            
                                SQLXML without XML encoding?
                            
                                Difference between Delete and Truncate in sql server. Was I wrong...?
                            
                                Connecting to Microsoft SQL Server using Clojure
                            
                                SQL do inner join if condition met
                            
                                How to query current user's roles
                            
                                Which data type should be used for saving images in database? [closed]
                            
                                Retrieving SQL Server Full Text Index terms
                            
                                Find usage of a function in SQL server
                            
                                Insert into replicated table fails - identity range check
                            
                                How can I create a CHECK constraint on a VARCHAR column in SQL Server specifying a minimum data length?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

What is the point of COLLATIONS for nvarchar (Unicode) columns?

Tags:

sql-server

sql-server-2008

nvarchar

unicode

collation

Royi Namir

People also ask

1 Answers

Solomon Rutzky

Recent Activity

Donate For Us