The string "\u1FFF:foo"
starts with \u1FFF
(or ""), right?
So how can these both be true?
"\u1FFF:foo".StartsWith(":") // equals true
"\u1FFF:foo".StartsWith("\u1FFF") // equals true
// alternatively, the same:
":foo".StartsWith(":") // equals true
":foo".StartsWith("") // equals true
Does .NET claim that this string starts with two different characters?
And while I find this very surprising and would like to understand the "why", I'm equally interested in how I can force .NET to search exclusively by codepoints instead (using InvariantCulture
doesn't seem to do a thing)?
And for comparison, one characters below that, "\u1FFE:foo".StartsWith(":")
returns false.
That a string in general might be considered to start with two different strings that are not byte-for-byte identical is not surprising (because Unicode is complicated). For example, these results are almost always going to reflect what a user wants:
"n\u0303".StartsWith("\u00f1") // true
"n\u0303".StartsWith("n") // false
Using System.Globalization.CharUnicodeInfo.GetUnicodeCategory
, you can see that '\u1fff'
is in the "OtherNotAssigned" category; it's unclear to me whether that should affect string search/sort/comparison operations (it does not appear to affect normalization, that is, the characters remain after normalization).
If you want a byte-for-byte comparison, use StringComparison.Ordinal
.
Because you are using String.StartsWith()
incorrectly. You should use String.StartsWith (String, StringComparison)
overload and StringComparison.Ordinal
.
There is no character assigned to \u1FFF
. I.e. there is no linguistic meaning attached to this code. See Greek Extended, Range: 1F00–1FFF excerpt from character code tables for Unicode Standard. Best Practices for Using Strings in .NET document from MSDN explicitly states that if you need to compare strings in a manner that ignores features of natural languages then you should use StringComparison.Ordinal
:
Specifying the StringComparison.Ordinal or StringComparison.OrdinalIgnoreCase value in a method call signifies a non-linguistic comparison in which the features of natural languages are ignored. Methods that are invoked with these StringComparison values base string operation decisions on simple byte comparisons instead of casing or equivalence tables that are parameterized by culture. In most cases, this approach best fits the intended interpretation of strings while making code faster and more reliable.
Moreover, it recommends to always explicitly specify StringComparison
in such method calls:
When you develop with .NET, follow these simple recommendations when you use strings:
- Use overloads that explicitly specify the string comparison rules for string operations. Typically, this involves calling a method overload that has a parameter of type StringComparison.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With