If I have some list of strings contain all numbers and dashes they will sort ascending like so:
s = s.OrderBy(t => t).ToList();
66-0616280-000
66-0616280-100
66-06162801000
66-06162801040
This is as expected.
However, if the strings contain letters, the sort is somewhat unexpected. For example, here is the same list of string with trailing A's replacing the 0s, and yes, it is sorted:
66-0616280-00A
66-0616280100A
66-0616280104A
66-0616280-10A
I would have expected them to sort like so:
66-0616280-00A
66-0616280-10A
66-0616280100A
66-0616280104A
Why does the sort behave differently on the string when it contains letters vs. when it contains only numbers?
Thanks in advance.
A normal Excel alphabetical sort will not prioritize the numeric parts of a string on its own. Here is how to sort alphanumeric data with complete control… Let’s imagine you have a list of employees. It might have their first name, last name, and job title.
Then click on the Sort button on the Data tab of the menu. Choose to sort by the EN Sort 1 column. Click the “Add Level” button to specify a second sort criteria. Choose to sort by the EN Sort 2 column in the second level.
Excel’s built-in sort functions are incredibly useful for organizing data and ordering information like dates, times, and other numerical inputs. Sometimes, though, you may need to sort product IDs, employee numbers, or other information that has letters and numbers in it. Then, what do you do?
Since the cells contain text as well as numbers, Excel treats the entire cell like a text string. It sorts according to the order the “letters” appear instead of the entire number (e.g. the “1” in “P42-16” comes before the “5” in “P42-5”). We’re going to need to do a bit more work to make Excel do our bidding….
It's because the default StringComparer
is culture-sensitive. As far as I can tell, Comparer<string>.Default
delegates to string.CompareTo(string)
which uses the current culture:
This method performs a word (case-sensitive and culture-sensitive) comparison using the current culture. For more information about word, string, and ordinal sorts, see
System.Globalization.CompareOptions
.
Then the page for CompareOptions
includes:
The .NET Framework uses three distinct ways of sorting: word sort, string sort, and ordinal sort. Word sort performs a culture-sensitive comparison of strings. Certain nonalphanumeric characters might have special weights assigned to them. For example, the hyphen ("-") might have a very small weight assigned to it so that "coop" and "co-op" appear next to each other in a sorted list. String sort is similar to word sort, except that there are no special cases. Therefore, all nonalphanumeric symbols come before all alphanumeric characters. Ordinal sort compares strings based on the Unicode values of each element of the string.
("Small weight" isn't quite the same as "ignored" as quoted in Andrei's answer, but the effects are similar here.)
If you specify StringComparer.Ordinal
, you get results of:
66-0616280-00A
66-0616280-10A
66-0616280100A
66-0616280104A
Specify it as the second argument to OrderBy
:
s = s.OrderBy(t => t, StringComparer.Ordinal).ToList();
You can see the difference here:
Console.WriteLine(Comparer<string>.Default.Compare
("66-0616280104A", "66-0616280-10A"));
Console.WriteLine(StringComparer.Ordinal.Compare
("66-0616280104A", "66-0616280-10A"));
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With