Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

String sorting issue in C#

I have List like this

    List<string> items = new List<string>();
    items.Add("-");
    items.Add(".");
    items.Add("a-");
    items.Add("a.");
    items.Add("a-a");
    items.Add("a.a");

    items.Sort();

    string output = string.Empty;
    foreach (string s in items)
    {
        output += s + Environment.NewLine;
    }

MessageBox.Show(output);

The output is coming back as

-
.
a-
a.
a.a
a-a

where as I am expecting the results as

-
.
a-
a.
a-a
a.a

Any idea why "a-a" is not coming before "a.a" where as "a-" comes before "a."

like image 433
Satya Avatar asked Feb 20 '12 00:02

Satya


People also ask

Can you sort a string in C?

We can easily sort an array of strings in C using the bubble sort algorithm. We compare the adjacent strings using the strcmp() method inside the nested 'for' loops and swap them if they are in the wrong order (i.e. if strcmp() returns a value greater than 0.).

Does sort work on strings?

The sorted() function returns a sorted list of the specified iterable object. You can specify ascending or descending order. Strings are sorted alphabetically, and numbers are sorted numerically. Note: You cannot sort a list that contains BOTH string values AND numeric values.

Does bubble sort work for strings?

Bubble sort is a basic algorithm for arranging a string of numbers or other elements in the correct order. The method works by examining each set of adjacent elements in the string, from left to right, switching their positions if they are out of order.


1 Answers

I suspect that in the last case "-" is treated in a different way due to culture-specific settings (perhaps as a "dash" as opposed to "minus" in the first strings). MSDN warns about this:

The comparison uses the current culture to obtain culture-specific information such as casing rules and the alphabetic order of individual characters. For example, a culture could specify that certain combinations of characters be treated as a single character, or uppercase and lowercase characters be compared in a particular way, or that the sorting order of a character depends on the characters that precede or follow it.

Also see in this MSDN page:

The .NET Framework uses three distinct ways of sorting: word sort, string sort, and ordinal sort. Word sort performs a culture-sensitive comparison of strings. Certain nonalphanumeric characters might have special weights assigned to them; for example, the hyphen ("-") might have a very small weight assigned to it so that "coop" and "co-op" appear next to each other in a sorted list. String sort is similar to word sort, except that there are no special cases; therefore, all nonalphanumeric symbols come before all alphanumeric characters. Ordinal sort compares strings based on the Unicode values of each element of the string.

So, hyphen gets a special treatment in the default sort mode in order to make the word sort more "natural".

You can get "normal" ordinal sort if you specifically turn it on:

     Console.WriteLine(string.Compare("a.", "a-"));                  //1
     Console.WriteLine(string.Compare("a.a", "a-a"));                //-1

     Console.WriteLine(string.Compare("a.", "a-", StringComparison.Ordinal));    //1
     Console.WriteLine(string.Compare("a.a", "a-a", StringComparison.Ordinal));  //1

To sort the original collection using ordinal comparison use:

     items.Sort(StringComparer.Ordinal);
like image 150
Max Galkin Avatar answered Oct 17 '22 07:10

Max Galkin