Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Alphabetical order does not compare from left to right?

I thought that in .NET strings were compared alphabetically and that they were compared from left to right.

string[] strings = { "-1", "1", "1Foo", "-1Foo" };
Array.Sort(strings);
Console.WriteLine(string.Join(",", strings));

I'd expect this (or the both with minus at the beginning first):

1,1Foo,-1,-1Foo

But the result is:

1,-1,1Foo,-1Foo

It seems to be a mixture, either the minus sign is ignored or multiple characters are compared even if the first character was already different.

Edit: I've now tested OrdinalIgnoreCase and i get the expected order:

Array.Sort(strings, StringComparer.OrdinalIgnoreCase);

But even if i use InvariantCultureIgnoreCase i get the unexpected order.

like image 211
Tim Schmelter Avatar asked Sep 30 '22 11:09

Tim Schmelter


1 Answers

Jon Skeet to the rescue here

Specifically:

The .NET Framework uses three distinct ways of sorting: word sort, string sort, and ordinal sort. Word sort performs a culture-sensitive comparison of strings. Certain nonalphanumeric characters might have special weights assigned to them. For example, the hyphen ("-") might have a very small weight assigned to it so that "coop" and "co-op" appear next to each other in a sorted list. String sort is similar to word sort, except that there are no special cases. Therefore, all nonalphanumeric symbols come before all alphanumeric characters. Ordinal sort compares strings based on the Unicode values of each element of the string.

But adding the StringComparer.Ordinal makes it behave as you want:

string[] strings = { "-1", "1", "10", "-10", "a", "ba","-a" };      
Array.Sort(strings,StringComparer.Ordinal );
Console.WriteLine(string.Join(",", strings));
// prints: -1,-10,-a,1,10,a,ba

Edit:
About the Ordinal, quoting from MSDN CompareOptions Enumeration

Ordinal Indicates that the string comparison must use successive Unicode UTF-16 encoded values of the string (code unit by code unit comparison), leading to a fast comparison but one that is culture-insensitive. A string starting with a code unit XXXX16 comes before a string starting with YYYY16, if XXXX16 is less than YYYY16. This value cannot be combined with other CompareOptions values and must be used alone.

Also seems you have String.CompareOrdinal if you want the ordinal of 2 strings.

Here's another note of interest:

When possible, the application should use string comparison methods that accept a CompareOptions value to specify the kind of comparison expected. As a general rule, user-facing comparisons are best served by the use of linguistic options (using the current culture), while security comparisons should specify Ordinal or OrdinalIgnoreCase.

I guess we humans expect ordinal when dealing with strings :)

like image 90
Noctis Avatar answered Oct 17 '22 19:10

Noctis