I am creating a substring from a string with non-combining diacritics that follow a space. When doing so, I check the string with .Contains()
and then perform the substring. When I use a space char
inside of an .IndexOf()
, the program performs as expected, yet when using the string " ", within .IndexOf()
the program throws an exception. As shown in the samples below only a string
preceding the primary stress diacritic (U+02C8) throws an ArgumentOutOfRangeException
.
Simple code (Edit suggested by John):
string a = "aɪ prɪˈzɛnt";
string b = "maɪ ˈprɛznt";
// A
Console.WriteLine(a.IndexOf(" ")); // string index: 2
Console.WriteLine(a.IndexOf(' ')); // char index: 2
// B
Console.WriteLine(b.IndexOf(" ")); // string index: -1
Console.WriteLine(b.IndexOf(' ')); // char index: 3
Sample code I tested with:
const string iPresent = "aɪ prɪˈzɛnt",
myPresent = "maɪ ˈprɛznt";
if(iPresent.Contains(' '))
{
Console.WriteLine(iPresent.Substring(0, iPresent.IndexOf(' ')));
}
if(iPresent.Contains(" "[0]))
{
Console.WriteLine(iPresent.Substring(0, iPresent.IndexOf(" "[0])));
}
if(iPresent.Contains(" "))
{
Console.WriteLine(iPresent.Substring(0, iPresent.IndexOf(" ")));
}
if(iPresent.Contains(string.Empty + ' '))
{
Console.WriteLine(iPresent.Substring(0, iPresent.IndexOf(string.Empty + ' ')));
}
if (myPresent.Contains(' '))
{
Console.WriteLine(myPresent.Substring(0, myPresent.IndexOf(' ')));
}
if (myPresent.Contains(" "[0]))
{
Console.WriteLine(myPresent.Substring(0, myPresent.IndexOf(" "[0])));
}
if (myPresent.Contains(string.Empty + ' '))
{
try
{
Console.WriteLine(myPresent.Substring(0, myPresent.IndexOf(string.Empty + ' ')));
}
catch (Exception ex)
{
Console.WriteLine("***" + ex.Message);
}
}
if (myPresent.Contains(" "))
{
try
{
Console.WriteLine(myPresent.Substring(0, myPresent.IndexOf(" ")));
}
catch (Exception ex)
{
Console.WriteLine("***" + ex.Message);
}
}
IndexOf(string)
does something different from IndexOf(char)
, because IndexOf(char)
...
...performs an ordinal (culture-insensitive) search, where a character is considered equivalent to another character only if their Unicode scalar values are the same.
whereas IndexOf(string)
...
performs a word (case-sensitive and culture-sensitive) search using the current culture.
So it's a whole lot "smarter" than IndexOf(char)
because it takes into account the string comparison rules of the current culture. This is why it doesn't find the space character.
After some testing in other languages and platforms, I suspect this is a bug of .NET Framework. Because in .NET Core 3.1, b.IndexOf(" ")
doesn't return -1... Neither does b.IndexOf(' ', StringComparison.CurrentCulture)
. Other languages/platforms where "maɪ ˈprɛznt" contains a space culture-sensitively include:
Passing in StringComparison.Ordinal
works:
b.IndexOf(" ", StringComparison.Ordinal)
But do note that you lose the smartness of culture-sensitive comparison.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With