Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to Sort in .NET Same as a SQL Server Collation?

I've been doing some TDD against some existing stored procedures. They return XML, so I've been using LINQ to XML.

At the moment, I'm working on a test that will prove that the data have been sorted properly. The test navigates through the XML, and creates an IEnumerable of an anonymous type containing the three columns that should be sorted. From that, it creates a second IEnumerable by sorting the first:

var sortedColumns = from g in columns
                    orderby g.ColumnA ascending,
                            g.ColumnB ascending,
                            g.ColumnC ascending
                    select g;

Finally, it asserts that the sorted colums are the same as the unsorted ones using SequenceEquals.

The problem comes when the database collation differs from the current collation. In particular, .NET is placing "W-" before "Wa" in ColumnB.

Is there a way to sort in the same order as an arbitrary SQL Server collation? If not, then how would I sort in the same order as SQL_Latin1_General_CP1_CI_AS?

like image 295
John Saunders Avatar asked Jan 06 '11 23:01

John Saunders


2 Answers

If is a Windows collation, then it boils down to setting the appropriate locale and sort order, as in Sort Order Identifiers, which in .Net world goes by the currently set UI culture.

If is a SQL collation, then is a bit more complicated. If you use VARCHAR you're out of lock. For NVARCHAR, there is some hope. See Comparing SQL collations to Windows collations:

A SQL collation's rules for sorting non-Unicode data are incompatible with any sort routine that is provided by the Microsoft Windows operating system; however, the sorting of Unicode data is compatible with a particular version of the Windows sorting rules. Because the comparison rules for non-Unicode and Unicode data are different, when you use a SQL collation you might see different results for comparisons of the same characters, depending on the underlying data type. For example, if you are using the SQL collation "SQL_Latin1_General_CP1_CI_AS", the non-Unicode string 'a-c' is less than the string 'ab' because the hyphen ("-") is sorted as a separate character that comes before "b". However, if you convert these strings to Unicode and you perform the same comparison, the Unicode string N'a-c' is considered to be greater than N'ab' because the Unicode sorting rules use a "word sort" that ignores the hyphen.

For your purposes (TDD) I would recommend simply staying away from suspicious characters, like hyphen -, or having two s one after another (German ss issues), or capital I (Turkish colliton issues), sh (Spanish collation issues) etc etc... Stick to a small subset of characters that sort properly, like a,A,b,B. I'm serious.

like image 87
Remus Rusanu Avatar answered Nov 10 '22 11:11

Remus Rusanu


Write a custom IComparer implementation that actually calls out to SQL server to compare each of the two values it's asked to compare.

It would be very slow, a little silly, but it would do exactly what you're asking.

Or if you only care about replicating one particular collation sequence exactly and not any random collation, then write a one-time-use routine to call out to SQL Server and build you a collection of all characters in the order SQL server will put them and use that to build a custom IComparer implementation which will then match SQL server's sequence without calling out to SQL server at run time.

like image 39
Samuel Neff Avatar answered Nov 10 '22 13:11

Samuel Neff