Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Why ("abc"+char.MaxValue).CompareTo("abc")==0?

Tags:

c#

I have a sorted array of strings. Given a string that identifies a prefix, I perform two binary searches to find the first and last positions in the array that contain words that start with that prefix:

string [] words = {"aaa","abc","abcd","acd"};
string prefix = "abc";
int firstPosition = Array.BinarySearch<string>(words, prefix);
int lastPosition = Array.BinarySearch<string>(words, prefix + char.MaxValue);
if (firstPosition < 0)
    firstPosition = ~firstPosition;
if (lastPosition < 0)
    lastPosition = ~lastPosition;

Running this code I get firstPosition and lastPosition both equal to 1, while the right answer is to have lastPosition equal to 3 (i.e., pointing to the first non-matching word). The BinarySearch method uses the CompareTo method to compare the objects and I have found that

("abc"+char.MaxValue).CompareTo("abc")==0

meaning that the two string are considered equal! If I change the code with

int lastPosition = Array.BinarySearch<string>(words, prefix + "z");

I get the right answer. Moreover I have found that

("abc"+char.MaxValue)==("abc")

correctly (with respect to my needs) returns false.

Could you please help me explaining the behavior of the CompareTo method?

I would like to have the CompareTo method to behave like the ==, so that the BinarySearch method returns 3 for lastPosition.

like image 242
Esuli Avatar asked Nov 20 '12 11:11

Esuli


2 Answers

According to the MSDN, string.CompareTo should not be used to check whether two strings are equal:

The CompareTo method was designed primarily for use in sorting or alphabetizing operations. It should not be used when the primary purpose of the method call is to determine whether two strings are equivalent. To determine whether two strings are equivalent, call the Equals method.

To get the behavior you wish, you could make use of the overload that accepts an IComparer<T>:

int lastPosition = Array.BinarySearch<string>(words, prefix + char.MaxValue, 
                                              StringComparer.Ordinal);

This will return -4 for lastPosition as there is no string with that prefix in the array. I don't understand why you expect 3 in that case...

like image 22
Daniel Hilgarth Avatar answered Sep 30 '22 02:09

Daniel Hilgarth


string.CompareTo() does a current-culture compare. Internally it uses StringComparer.CurrentCulture, whereas the string equals-operator does a culture-invariant compare.

For example, if the current-culture is "DE", you will get the same results with "ss" and "ß":

Console.WriteLine("ss".CompareTo("ß")); // => 0
Console.WriteLine("ss" == "ß"); // => false

What you want is a culture-invariant compare, which you will get by using StringComparer.Ordinal:

StringComparer.Ordinal.Compare("ss", "ß"); // => -108
StringComparer.Ordinal.Compare("abc"+char.MaxValue, "abc"); // => 65535
like image 154
ulrichb Avatar answered Sep 30 '22 04:09

ulrichb