It's a requirement for any comparison sort to work that the underlying order operator is transitive and antisymmetric. In .NET, that's not true for some strings: <pre class="prettyprint"><code>static void CompareBug() { string x = "\u002D\u30A2"; // or just "-ア" if charset allows string y = "\u3042"; // or just "あ" if charset allows Console.WriteLine(x.CompareTo(y)); // positive one Console.WriteLine(y.CompareTo(x)); // positive one Console.WriteLine(StringComparer.InvariantCulture.Compare(x, y)); // positive one Console.WriteLine(StringComparer.InvariantCulture.Compare(y, x)); // positive one var ja = StringComparer.Create(new CultureInfo("ja-JP", false), false); Console.WriteLine(ja.Compare(x, y)); // positive one Console.WriteLine(ja.Compare(y, x)); // positive one } </code></pre> You see that <code>x</code> is strictly greater than <code>y</code>, and <code>y</code> is strictly greater than <code>x</code>. Because <code>x.CompareTo(x)</code> and so on all give zero (<code>0</code>), it is clear that this is not an order. Not surprisingly, I get unpredictable results when I <code>Sort</code> arrays or lists containing strings like <code>x</code> and <code>y</code>. Though I haven't tested this, I'm sure <code>SortedDictionary<string, WhatEver></code> will have problems keeping itself in sorted order and/or locating items if strings like <code>x</code> and <code>y</code> are used for keys. Is this bug well-known? What versions of the framework are affected (I'm trying this with .NET 4.0)? EDIT: Here's an example where the sign is negative either way: <pre class="prettyprint"><code>x = "\u4E00\u30A0"; // equiv: "一゠" y = "\u4E00\u002D\u0041"; // equiv: "一-A" </code></pre>

I came across this SO post, while I was trying to figure out why I was having problems retrieving (string) keys that were inserted into a SortedList, after I discovered the cause was the odd behaviour of the .Net 40 and above comparers (a1 < a2 and a2 < a3, but a1 > a3). My struggle to figure out what was going on can be found here: c# SortedList<string, TValue>.ContainsKey for successfully added key returns false. You may want to have a look at the "UPDATE 3" section of my SO question. It appears that the issue was reported to Microsoft in Dec 2012, and closed before the end of january 2013 as "won't be fixed". Additionally it lists a workaround that may be used. I created an implementation of this recommended workaround, and verified that it fixed the problem that I had encountered. I also just verified that this resolves the issue you reported. <pre class="prettyprint"><code>public static void SO_13254153_Question() { string x = "\u002D\u30A2"; // or just "-ア" if charset allows string y = "\u3042"; // or just "あ" if charset allows var invariantComparer = new WorkAroundStringComparer(); var japaneseComparer = new WorkAroundStringComparer(new System.Globalization.CultureInfo("ja-JP", false)); Console.WriteLine(x.CompareTo(y)); // positive one Console.WriteLine(y.CompareTo(x)); // positive one Console.WriteLine(invariantComparer.Compare(x, y)); // negative one Console.WriteLine(invariantComparer.Compare(y, x)); // positive one Console.WriteLine(japaneseComparer.Compare(x, y)); // negative one Console.WriteLine(japaneseComparer.Compare(y, x)); // positive one } </code></pre> The remaining problem is that this workaround is so slow it is hardly practical for use with large collections of strings. So I hope Microsoft will reconsider closing this issue or that someone knows of a better workaround.

Bug in the string comparing of the .NET Framework

Tags:

c#

.net

sorting

string-comparison

It's a requirement for any comparison sort to work that the underlying order operator is transitive and antisymmetric.

In .NET, that's not true for some strings:

static void CompareBug() {   string x = "\u002D\u30A2";  // or just "-ア" if charset allows   string y = "\u3042";        // or just "あ" if charset allows    Console.WriteLine(x.CompareTo(y));  // positive one   Console.WriteLine(y.CompareTo(x));  // positive one   Console.WriteLine(StringComparer.InvariantCulture.Compare(x, y));  // positive one   Console.WriteLine(StringComparer.InvariantCulture.Compare(y, x));  // positive one    var ja = StringComparer.Create(new CultureInfo("ja-JP", false), false);   Console.WriteLine(ja.Compare(x, y));  // positive one   Console.WriteLine(ja.Compare(y, x));  // positive one }

You see that x is strictly greater than y, and y is strictly greater than x.

Because x.CompareTo(x) and so on all give zero (0), it is clear that this is not an order. Not surprisingly, I get unpredictable results when I Sort arrays or lists containing strings like x and y. Though I haven't tested this, I'm sure SortedDictionary<string, WhatEver> will have problems keeping itself in sorted order and/or locating items if strings like x and y are used for keys.

Is this bug well-known? What versions of the framework are affected (I'm trying this with .NET 4.0)?

EDIT:

Here's an example where the sign is negative either way:

x = "\u4E00\u30A0";         // equiv: "一゠" y = "\u4E00\u002D\u0041";   // equiv: "一-A"

785

asked Nov 06 '12 15:11

Jeppe Stig Nielsen

2 Answers

If correct sorting is so important in your problem, just use ordinal string comparison instead of culture-sensitive. Only this one guarantees transitive and antisymmetric comparing you want.

What MSDN says:

Specifying the StringComparison.Ordinal or StringComparison.OrdinalIgnoreCase value in a method call signifies a non-linguistic comparison in which the features of natural languages are ignored. Methods that are invoked with these StringComparison values base string operation decisions on simple byte comparisons instead of casing or equivalence tables that are parameterized by culture. In most cases, this approach best fits the intended interpretation of strings while making code faster and more reliable.

And it works as expected:

    Console.WriteLine(String.Compare(x, y, StringComparison.Ordinal));  // -12309     Console.WriteLine(String.Compare(y, x, StringComparison.Ordinal));  // 12309

Yes, it doesn't explain why culture-sensitive comparison gives inconsistent results. Well, strange culture — strange result.

150

answered Oct 12 '22 06:10

shuribot

I came across this SO post, while I was trying to figure out why I was having problems retrieving (string) keys that were inserted into a SortedList, after I discovered the cause was the odd behaviour of the .Net 40 and above comparers (a1 < a2 and a2 < a3, but a1 > a3).

My struggle to figure out what was going on can be found here: c# SortedList<string, TValue>.ContainsKey for successfully added key returns false.

You may want to have a look at the "UPDATE 3" section of my SO question. It appears that the issue was reported to Microsoft in Dec 2012, and closed before the end of january 2013 as "won't be fixed". Additionally it lists a workaround that may be used.

I created an implementation of this recommended workaround, and verified that it fixed the problem that I had encountered. I also just verified that this resolves the issue you reported.

public static void SO_13254153_Question() {     string x = "\u002D\u30A2";  // or just "-ア" if charset allows     string y = "\u3042";        // or just "あ" if charset allows              var invariantComparer = new WorkAroundStringComparer();     var japaneseComparer = new WorkAroundStringComparer(new System.Globalization.CultureInfo("ja-JP", false));     Console.WriteLine(x.CompareTo(y));  // positive one     Console.WriteLine(y.CompareTo(x));  // positive one     Console.WriteLine(invariantComparer.Compare(x, y));  // negative one     Console.WriteLine(invariantComparer.Compare(y, x));  // positive one     Console.WriteLine(japaneseComparer.Compare(x, y));  // negative one     Console.WriteLine(japaneseComparer.Compare(y, x));  // positive one }

The remaining problem is that this workaround is so slow it is hardly practical for use with large collections of strings. So I hope Microsoft will reconsider closing this issue or that someone knows of a better workaround.

answered Oct 12 '22 06:10

Alex

Related questions
                            
                                Hide the TabControl header
                            
                                Change cursor to hand when I hover over a button
                            
                                Scroll WPF ListBox to the SelectedItem set in code in a view model
                            
                                IsNumeric function in c#
                            
                                IISExpress cannot find ssl page running localhost with Visual Studio 2013
                            
                                Writing logs to file
                            
                                Can C# generics have a specific base type?
                            
                                Remove formatting from a string: "(123) 456-7890" => "1234567890"?
                            
                                Microsoft unit testing. Is it possible to skip test from test method body?
                            
                                Creating a Huge Dummy File in a Matter of Seconds in C#
                            
                                C# || operator not working with nullable booleans
                            
                                Specifying custom property name when binding object to Web API endpoint
                            
                                How to call a button click event from another method
                            
                                System.NotSupportedException when trying to create an asset
                            
                                Xamarin forms android Application not getting DeviceToken Parse SDK
                            
                                SqlDependency Losing Subscription Over Time
                            
                                What memory model is implemented in .NET Core?
                            
                                How to implement C# access modifiers in javascript?
                            
                                Best way to add developer documentation to your Visual Studio projects [closed]
                            
                                Kinect sideways skeleton tracking

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With