I've been doing some TDD against some existing stored procedures. They return XML, so I've been using LINQ to XML.
At the moment, I'm working on a test that will prove that the data have been sorted properly. The test navigates through the XML, and creates an IEnumerable of an anonymous type containing the three columns that should be sorted. From that, it creates a second IEnumerable by sorting the first:
var sortedColumns = from g in columns
orderby g.ColumnA ascending,
g.ColumnB ascending,
g.ColumnC ascending
select g;
Finally, it asserts that the sorted colums are the same as the unsorted ones using SequenceEquals.
The problem comes when the database collation differs from the current collation. In particular, .NET is placing "W-" before "Wa" in ColumnB.
Is there a way to sort in the same order as an arbitrary SQL Server collation? If not, then how would I sort in the same order as SQL_Latin1_General_CP1_CI_AS?
If is a Windows collation, then it boils down to setting the appropriate locale and sort order, as in Sort Order Identifiers, which in .Net world goes by the currently set UI culture.
If is a SQL collation, then is a bit more complicated. If you use VARCHAR you're out of lock. For NVARCHAR, there is some hope. See Comparing SQL collations to Windows collations:
A SQL collation's rules for sorting non-Unicode data are incompatible with any sort routine that is provided by the Microsoft Windows operating system; however, the sorting of Unicode data is compatible with a particular version of the Windows sorting rules. Because the comparison rules for non-Unicode and Unicode data are different, when you use a SQL collation you might see different results for comparisons of the same characters, depending on the underlying data type. For example, if you are using the SQL collation "SQL_Latin1_General_CP1_CI_AS", the non-Unicode string 'a-c' is less than the string 'ab' because the hyphen ("-") is sorted as a separate character that comes before "b". However, if you convert these strings to Unicode and you perform the same comparison, the Unicode string N'a-c' is considered to be greater than N'ab' because the Unicode sorting rules use a "word sort" that ignores the hyphen.
For your purposes (TDD) I would recommend simply staying away from suspicious characters, like hyphen -
, or having two s
one after another (German ss issues), or capital I
(Turkish colliton issues), sh
(Spanish collation issues) etc etc... Stick to a small subset of characters that sort properly, like a,A,b,B
. I'm serious.
Write a custom IComparer
implementation that actually calls out to SQL server to compare each of the two values it's asked to compare.
It would be very slow, a little silly, but it would do exactly what you're asking.
Or if you only care about replicating one particular collation sequence exactly and not any random collation, then write a one-time-use routine to call out to SQL Server and build you a collection of all characters in the order SQL server will put them and use that to build a custom IComparer
implementation which will then match SQL server's sequence without calling out to SQL server at run time.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With