I want to know if I found a bug in the .NET Framework, or if I don't understand something. After running this piece of code:
var text = "مباركُ وبعض أكثر من نص";
var word = "مبارك";
bool exist = text.Contains(word);
int index = text.IndexOf(word);
The results are the "exists = true" and "index = -1"
How can it be?
You can usually tell by the code points within the string itself. Arabic occupies certain blocks in the Unicode code space. It's a fairly safe bet that, if a substantial proportion of the characters exist in those blocks (such as بلدي الحوامات مليء الثعابينة ), it's Arabic text.
NET 4.0 - IndexOf no longer uses Ordinal Comparison and so Contains can be faster.
The indexOf() method returns the position of the first occurrence of a value in a string. The indexOf() method returns -1 if the value is not found.
The indexOf() method returns the position of the first occurrence of specified character(s) in a string. Tip: Use the lastIndexOf method to return the position of the last occurrence of specified character(s) in a string.
Contains
is culture-insensitive:
This method performs an ordinal (case-sensitive and culture-insensitive) comparison.
IndexOf
is culture-sensitive:
This method performs a word (case-sensitive and culture-sensitive) search using the current culture.
That's the difference. If you use
int index = text.IndexOf(word, StringComparison.Ordinal);
then you'll get an index of 0 instead of -1 (so it's consistent with Contains
).
There's no culture-sensitive overload of Contains
; it's unclear to me whether you can use IndexOf
reliably for this, but the CompareInfo
class gives some more options. (I really don't know much about the details of cultural comparisons, particularly with RTL text. I just know it's complicated!)
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With