I have a sorted array of strings. Given a string that identifies a prefix, I perform two binary searches to find the first and last positions in the array that contain words that start with that prefix:
string [] words = {"aaa","abc","abcd","acd"};
string prefix = "abc";
int firstPosition = Array.BinarySearch<string>(words, prefix);
int lastPosition = Array.BinarySearch<string>(words, prefix + char.MaxValue);
if (firstPosition < 0)
firstPosition = ~firstPosition;
if (lastPosition < 0)
lastPosition = ~lastPosition;
Running this code I get firstPosition and lastPosition both equal to 1, while the right answer is to have lastPosition equal to 3 (i.e., pointing to the first non-matching word). The BinarySearch method uses the CompareTo method to compare the objects and I have found that
("abc"+char.MaxValue).CompareTo("abc")==0
meaning that the two string are considered equal! If I change the code with
int lastPosition = Array.BinarySearch<string>(words, prefix + "z");
I get the right answer. Moreover I have found that
("abc"+char.MaxValue)==("abc")
correctly (with respect to my needs) returns false.
Could you please help me explaining the behavior of the CompareTo method?
I would like to have the CompareTo method to behave like the ==, so that the BinarySearch method returns 3 for lastPosition.
According to the MSDN, string.CompareTo
should not be used to check whether two strings are equal:
The CompareTo method was designed primarily for use in sorting or alphabetizing operations. It should not be used when the primary purpose of the method call is to determine whether two strings are equivalent. To determine whether two strings are equivalent, call the Equals method.
To get the behavior you wish, you could make use of the overload that accepts an IComparer<T>
:
int lastPosition = Array.BinarySearch<string>(words, prefix + char.MaxValue,
StringComparer.Ordinal);
This will return -4
for lastPosition
as there is no string with that prefix in the array. I don't understand why you expect 3
in that case...
string.CompareTo()
does a current-culture compare. Internally it uses StringComparer.CurrentCulture
, whereas the string equals-operator does a culture-invariant compare.
For example, if the current-culture is "DE", you will get the same results with "ss" and "ß":
Console.WriteLine("ss".CompareTo("ß")); // => 0
Console.WriteLine("ss" == "ß"); // => false
What you want is a culture-invariant compare, which you will get by using StringComparer.Ordinal
:
StringComparer.Ordinal.Compare("ss", "ß"); // => -108
StringComparer.Ordinal.Compare("abc"+char.MaxValue, "abc"); // => 65535
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With