Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

String sorting gotcha

Tags:

c#

sorting

I have following C# code compiled as Sort.exe:

using System;
using System.Collections.Generic;

class Test
{
    public static int Main(string[] args)
    {
        string text = null;
        List<string> lines = new List<string>();
        while((text = Console.In.ReadLine()) != null)
        {
            lines.Add(text);
        }

        lines.Sort();

        foreach(var line in lines)
            Console.WriteLine(line);

        return 0;
    }
}

I have a file input.txt which has following 5 lines as its content:

x000000000000000000093.000000000
x000000000000000000037.000000000
x000000000000000100000.000000000
x000000000000000000538.000000000
x-00000000000000000020.000000000

Now if I run it on command prompt following is the output:

C:\Users\girijesh\AppData\Local\Temp>sort < input.txt
x000000000000000000037.000000000
x000000000000000000093.000000000
x-00000000000000000020.000000000
x000000000000000000538.000000000
x000000000000000100000.000000000

I am not able to understand what kind of string sorting it is where string starting with x-(3rd line in output) comes in middle of strings starting with x0. Either 3rd line should have been at the top or at the bottom. Excel is also showing the same behaviour.

like image 617
user2176811 Avatar asked May 14 '14 13:05

user2176811


People also ask

How do I sort a string character?

The main logic is to toCharArray() method of the String class over the input string to create a character array for the input string. Now use Arrays. sort(char c[]) method to sort character array. Use the String class constructor to create a sorted string from a char array.

How do you sort a char array in C++?

char charArray[] = {'A','Z', 'K', 'L' }; size_t arraySize = sizeof(charArray)/sizeof(*charArray); std::sort(charArray, charArray+arraySize); //print charArray : it will print all chars in ascending order. By the way, you should avoid using c-style arrays, and should prefer using std::array or std::vector .

How do you sort a character array in Java without using the sort method?

Using the reverseOrder() Method Java Collections class provides the reverseOrder() method to sort the array in reverse-lexicographic order. It is a static method, so we can invoke it directly by using the class name.

Does Java have a sort method?

In Java, the collections framework provides a static method sort() that can be used to sort elements in a collection. The sort() method of the collections framework uses the merge sort algorithm to sort elements of a collection. The merge sort algorithm is based on divide and conquers rule.


1 Answers

In many cultures (including the invariant culture) the hyphen is a character that is of only minor importance for sorting purposes. In most texts, this makes sense: pre-whatever and prewhatever are pretty similar. For example, the following list is sorted as this, which I think is good:

preasdf
prewhatever
pre-whatever
prezxcv

You seem to want an Ordinal comparison, where values are compared purely by their unicode code point values. If you change the line to:

lines.Sort(StringComparer.Ordinal);

Then your results are:

x-00000000000000000020.000000000
x000000000000000000037.000000000
x000000000000000000093.000000000
x000000000000000000538.000000000
x000000000000000100000.000000000

If you're wondering why the -...20.0 value ended up where it did, consider what it'd look like if you removed the - (and compare with the above pre list).

x000000000000000000037.000000000
x000000000000000000093.000000000
x00000000000000000020.000000000
x000000000000000000538.000000000
x000000000000000100000.000000000

If your input is always in the format x[some number], I'd parse the value after x as a decimal or double, and do the sorting on that. That would make it easier to ensure expected behavior, and overall better.

like image 139
Tim S. Avatar answered Oct 12 '22 22:10

Tim S.