String sorting gotcha

Tags:

sorting

I have following C# code compiled as Sort.exe:

using System;
using System.Collections.Generic;

class Test
{
    public static int Main(string[] args)
    {
        string text = null;
        List<string> lines = new List<string>();
        while((text = Console.In.ReadLine()) != null)
        {
            lines.Add(text);
        }

        lines.Sort();

        foreach(var line in lines)
            Console.WriteLine(line);

        return 0;
    }
}

I have a file input.txt which has following 5 lines as its content:

x000000000000000000093.000000000
x000000000000000000037.000000000
x000000000000000100000.000000000
x000000000000000000538.000000000
x-00000000000000000020.000000000

Now if I run it on command prompt following is the output:

C:\Users\girijesh\AppData\Local\Temp>sort < input.txt
x000000000000000000037.000000000
x000000000000000000093.000000000
x-00000000000000000020.000000000
x000000000000000000538.000000000
x000000000000000100000.000000000

I am not able to understand what kind of string sorting it is where string starting with x-(3rd line in output) comes in middle of strings starting with x0. Either 3rd line should have been at the top or at the bottom. Excel is also showing the same behaviour.

617

asked May 14 '14 13:05

user2176811

1 Answers

In many cultures (including the invariant culture) the hyphen is a character that is of only minor importance for sorting purposes. In most texts, this makes sense: pre-whatever and prewhatever are pretty similar. For example, the following list is sorted as this, which I think is good:

preasdf
prewhatever
pre-whatever
prezxcv

You seem to want an Ordinal comparison, where values are compared purely by their unicode code point values. If you change the line to:

lines.Sort(StringComparer.Ordinal);

Then your results are:

x-00000000000000000020.000000000
x000000000000000000037.000000000
x000000000000000000093.000000000
x000000000000000000538.000000000
x000000000000000100000.000000000

If you're wondering why the -...20.0 value ended up where it did, consider what it'd look like if you removed the - (and compare with the above pre list).

x000000000000000000037.000000000
x000000000000000000093.000000000
x00000000000000000020.000000000
x000000000000000000538.000000000
x000000000000000100000.000000000

If your input is always in the format x[some number], I'd parse the value after x as a decimal or double, and do the sorting on that. That would make it easier to ensure expected behavior, and overall better.

139

answered Oct 12 '22 22:10

Tim S.

Related questions
                            
                                DI with auto-generated web service clients
                            
                                What is the purpose of "sealed" in C# when "virtual" is optional?
                            
                                Replacement for SynchronizationContext.Send() in Portable Class Libraries
                            
                                Building a CAD program in WPF
                            
                                WebApi attribute routing defined on interface
                            
                                Using strict types in linq grouping query
                            
                                Why Thread.CurrentContext property and Thread.GetDomain() method?
                            
                                Override http status code from validator
                            
                                How to throw Exception when class has wrong inheritance
                            
                                How to get response data from RestSharp to download for user?
                            
                                VS Application Insights for a Web App deployed to multiple environments
                            
                                The parameter '***' was not bound in the specified LINQ to Entities query expression
                            
                                Make width auto of text block
                            
                                Is memory cleared before garbage collection?
                            
                                Using app.config correctly
                            
                                BadImageFormatException when using C++/CLI in a C# application
                            
                                Why does Enum.ToString() not return the correct enum name?
                            
                                How to determine which transport method Signal R is using
                            
                                MVC 5 / ASP.Net 4.5 Storing Connection Strings Securely
                            
                                Consume a Web Service that requires WS-Security from ASP.NET 4.5 Application

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With