Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

IndexOf matching when Unicode 0xFFFD is in the string - bug or feature?

Tags:

c#

indexof

In VS2012's C# the following code:

string test = "[ " + (char)0xFFFD + " ]";
System.Console.WriteLine("{0}", test.IndexOf("  ") == 1);

results in a

True

printed to console output window. The spaces are separated by 0xFFFD yet it matches two consecutive spaces. Is that an expected result/feature or a (known) bug?

like image 506
ShamilS Avatar asked May 20 '14 22:05

ShamilS


1 Answers

It's an expected result. FFFD is a "replacement character" in Unicode and is not meaningful in any culture. IndexOf ignores any non-meaningful characters in its search:

Character sets include ignorable characters, which are characters that are not considered when performing a linguistic or culture-sensitive comparison.

like image 177
D Stanley Avatar answered Nov 15 '22 00:11

D Stanley