Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

string.Empty.StartsWith(((char)10781).ToString()) always returns true?

I trying to handle to following character: ⨝ (http://www.fileformat.info/info/unicode/char/2a1d/index.htm)

If you checking whether an empty string starting with this character, it always returns true, this does not make any sense! Why is that?

// visual studio 2008 hides lines that have this char literally (bug in visual studio?!?) so i wrote it's unicode instead.
char specialChar = (char)10781;
string specialString = specialChar.ToString();

// prints 1
Console.WriteLine(specialString.Length);

// prints 10781
Console.WriteLine((int)specialChar);

// prints false
Console.WriteLine(string.Empty.StartsWith("A"));

// both prints true WTF?!?
Console.WriteLine(string.Empty.StartsWith(specialString));
Console.WriteLine(string.Empty.StartsWith(((char)10781).ToString()));
like image 847
DxCK Avatar asked Dec 12 '09 11:12

DxCK


3 Answers

You can fix this bug by using ordinal StringComparison:

From the MSDN docs:

When you specify either StringComparison.Ordinal or StringComparison.OrdinalIgnoreCase, the string comparison will be non-linguistic. That is, the features that are specific to the natural language are ignored when making comparison decisions. This means the decisions are based on simple byte comparisons and ignore casing or equivalence tables that are parameterized by culture. As a result, by explicitly setting the parameter to either the StringComparison.Ordinal or StringComparison.OrdinalIgnoreCase, your code often gains speed, increases correctness, and becomes more reliable.

    char specialChar = (char)10781;


    string specialString = Convert.ToString(specialChar);

    // prints 1
    Console.WriteLine(specialString.Length);

    // prints 10781
    Console.WriteLine((int)specialChar);

    // prints false
    Console.WriteLine(string.Empty.StartsWith("A"));

    // prints false
    Console.WriteLine(string.Empty.StartsWith(specialString, StringComparison.Ordinal));
like image 177
RichardOD Avatar answered Sep 19 '22 00:09

RichardOD


Nice unicode glitch ;-p

I'm not sure why it does this, but amusingly:

Console.WriteLine(string.Empty.StartsWith(specialString)); // true
Console.WriteLine(string.Empty.Contains(specialString)); // false
Console.WriteLine("abc".StartsWith(specialString)); // true
Console.WriteLine("abc".Contains(specialString)); // false

I'm guessing this is treated a bit like the non-joining character that Jon mentioned at devdays; some string functions see it, and some don't. And if it doesn't see it, this becomes "does (some string) start with an empty string", which is always true.

like image 24
Marc Gravell Avatar answered Sep 20 '22 00:09

Marc Gravell


The underlying reason for this is the default string comparison is locale aware. This means using tables of locale data for comparisons (including equality).

Many (if not most) Unicode characters have no value for many locales, and thus don't exist (or do, but match anything, or nothing).

See entries on character weights on Michael Kaplan's blog "Sorting It All Out". This series of blogs contains a lot of background information (the APIs are native, but—as I understand—the mechanisms in .NET are the same).

Quick version: this is a complex area to get expected (normal language) comparisons right is hard, this tends to lead to odd things with code points for glyphs outside your language.

like image 22
Richard Avatar answered Sep 22 '22 00:09

Richard