Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Sorting while using specific culture - "BB" may come first before "AA" in Danish and Norwegian

Today I noticed an interesting sorting behavior in C#. I have two lists and I sort them:

var list1 = new List<string> { "A", "B", "C" };
var list2 = new List<string> { "AA", "BB", "CC" };
list1.Sort();
list2.Sort();

The two lists now contain:

>> list1
[0]: "A"
[1]: "B"
[2]: "C"

>> list2
[0]: "BB"
[1]: "CC"
[2]: "AA"

Why is the AA put in the end?

Here is a demonstration: http://ideone.com/QCeUjx

like image 391
dlebech Avatar asked Oct 08 '13 08:10

dlebech


2 Answers

It turns out that since I am using Danish culture settings, .NET assumes that "AA" is the Danish letter "Å" which is at the end of the Danish alphabet.

Setting the locale to en-US gives me the sort order I expected ("AA", "BB", "CC").

This article has some background information.

like image 116
dlebech Avatar answered Nov 09 '22 23:11

dlebech


You can also use the overload of List.Sort to ignore the current culture. Ordinal performs a simple byte comparison that is independent of the current language:

list1.Sort(StringComparer.Ordinal);

Demonstration

Here are some informations: Normalization and Sorting

Some Unicode characters have multiple equivalent binary representations consisting of sets of combining and/or composite Unicode characters. Consequently, two strings can look identical but actually consist of different characters. The existence of multiple representations for a single character complicates sorting operations. The solution to this problem is to normalize each string, then use an ordinal comparison to sort the strings....

like image 45
Tim Schmelter Avatar answered Nov 10 '22 01:11

Tim Schmelter