String Comparison differences between .NET and T-SQL?

Question

In a test case I've written, the string comparison doesn't appear to work the same way between SQL server / .NET CLR.

This C# code:

string lesser =  "SR2-A1-10-90";
string greater = "SR2-A1-100-10";

Debug.WriteLine(string.Compare("A","B"));
Debug.WriteLine(string.Compare(lesser, greater));

Will output:

-1
1

This SQL Server code:

declare @lesser varchar(20);
declare @greater varchar(20);

set @lesser =  'SR2-A1-10-90';
set @greater = 'SR2-A1-100-10';

IF @lesser < @greater
    SELECT 'Less Than';
ELSE
    SELECT 'Greater than';

Will output:

Less Than

Why the difference?

Martin Smith · Accepted Answer

This is documented here.

Windows collations (e.g. Latin1_General_CI_AS) use Unicode type collation rules. SQL Collations don't.

This causes the hyphen character to be treated differently between the two.

KeithS · Answer

Further to gbn's answer, you can make them behave the same by using CompareOptions.StringSort in C# (or by using StringComparison.Ordinal). This treats symbols as occurring before alphanumeric symbols, so "-" < "0".

However, Unicode vs ASCII doesn't explain anything, as the hex codes for the ASCII codepage are translated verbatim to the Unicode codepage: "-" is 002D (45) while "0" is 0030 (48).

What is happening is that .NET is using "linguistic" sorting by default, which is based on a non-ordinal ordering and weight applied to various symbols by the specified or current culture. This linguistic algorithm allows, for instance, "résumé" (spelled with accents) to appear immediately following "resume" (spelled without accents) in a sorted list of words, as "é" is given a fractional order just after "e" and well before "f". It also allows "cooperation" and "co-operation" to be placed closely together, as the dash symbol is given low "weight"; it matters only as the absolute final tiebreakers when sorting words like "bits", "bit's", and "bit-shift" (which would appear in that order).

So-called ordinal sorting (strictly according to Unicode values, with or without case insensitivity) will produce very different and sometimes illogical results, as variants of letters usually appear well after the basic undecorated Latin alphabet in ASCII/Unicode ordinals, while symbols occur before it. For instance, "é" comes after "z" and so the words "resume", "rosin", "ruble", "résumé" would be sorted in that order. "Bit's", "Bit-shift", "Biter", "Bits" would be sorted in that order as the apostrophe comes first, followed by the dash, then the letter "e", then the letter "s". Neither of these seem logical from a "natural language" perspective.

String Comparison differences between .NET and T-SQL?

Tags:

string

c#

sql

comparison

sql-server

Matt Brunell

2 Answers

Martin Smith

KeithS

Recent Activity

Donate For Us

String Comparison differences between .NET and T-SQL?

Tags:

string

c#

sql

comparison

sql-server

Matt Brunell

2 Answers

Martin Smith

KeithS

Related questions

Recent Activity

Donate For Us