Today I noticed an interesting sorting behavior in C#. I have two lists and I sort them:
var list1 = new List<string> { "A", "B", "C" };
var list2 = new List<string> { "AA", "BB", "CC" };
list1.Sort();
list2.Sort();
The two lists now contain:
>> list1
[0]: "A"
[1]: "B"
[2]: "C"
>> list2
[0]: "BB"
[1]: "CC"
[2]: "AA"
Why is the AA put in the end?
Here is a demonstration: http://ideone.com/QCeUjx
It turns out that since I am using Danish culture settings, .NET assumes that "AA" is the Danish letter "Å" which is at the end of the Danish alphabet.
Setting the locale to en-US
gives me the sort order I expected ("AA", "BB", "CC").
This article has some background information.
You can also use the overload of List.Sort
to ignore the current culture. Ordinal
performs a simple byte comparison that is independent of the current language:
list1.Sort(StringComparer.Ordinal);
Demonstration
Here are some informations: Normalization and Sorting
Some Unicode characters have multiple equivalent binary representations consisting of sets of combining and/or composite Unicode characters. Consequently, two strings can look identical but actually consist of different characters. The existence of multiple representations for a single character complicates sorting operations. The solution to this problem is to normalize each string, then use an ordinal comparison to sort the strings....
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With