Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Weird SQL Server 2005 Collation difference between varchar() and nvarchar()

Can someone please explain this:

SELECT 
  CASE WHEN CAST('iX' AS nvarchar(20)) 
      > CAST('-X' AS nvarchar(20)) THEN 1 ELSE 0 END,
  CASE WHEN CAST('iX' AS varchar(20)) 
      > CAST('-X' AS varchar(20)) THEN 1 ELSE 0 END

Results: 0 1

SELECT 
  CASE WHEN CAST('i' AS nvarchar(20)) 
      > CAST('-' AS nvarchar(20)) THEN 1 ELSE 0 END,
  CASE WHEN CAST('i' AS varchar(20)) 
      > CAST('-' AS varchar(20)) THEN 1 ELSE 0 END

Results: 1 1

On the first query, the nvarchar() result is not what I'm expecting, and yet removing the X make the nvarchar() sort happen as expected.

(My original queries used the '' and N'' literal syntax to distinguish varchar() and nvarchar() rather than CAST() and got the same result.)

Collation setting for the database is SQL_Latin1_General_CP1_CI_AS.

like image 582
richardtallent Avatar asked Jun 16 '10 20:06

richardtallent


1 Answers

When comparing unicode text, hyphens are treated specially. Unicode comparison uses "dictionary order", which ignores hyphens. This is not the case with non-unicode text comparison.

Comparing -X and iX, is like comparing X and iX, so -X, the left side, is greater. When comparing "-" and "i", is like comparing "" and "i", so "i", the right side is greater.

From MSDN,

A SQL collation's rules for sorting non-Unicode data are incompatible with any sort routine that is provided by the Microsoft Windows operating system; however, the sorting of Unicode data is compatible with a particular version of the Windows sorting rules. Because the comparison rules for non-Unicode and Unicode data are different, when you use a SQL collation you might see different results for comparisons of the same characters, depending on the underlying data type. For example, if you are using the SQL collation "SQL_Latin1_General_CP1_CI_AS", the non-Unicode string 'a-c' is less than the string 'ab' because the hyphen ("-") is sorted as a separate character that comes before "b". However, if you convert these strings to Unicode and you perform the same comparison, the Unicode string N'a-c' is considered to be greater than N'ab' because the Unicode sorting rules use a "word sort" that ignores the hyphen.

SELCT body From MSDN_Articles WHERE url IN ("http://support.microsoft.com/kb/322112")

like image 150
mdma Avatar answered Dec 29 '22 07:12

mdma