Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Why are String.IndexOf and String.Contains disagreeing when provided with Arabic text?

Tags:

.net

arabic

I want to know if I found a bug in the .NET Framework, or if I don't understand something. After running this piece of code:

var text = "مباركُ وبعض أكثر من نص";
var word = "مبارك";
bool exist = text.Contains(word);
int index = text.IndexOf(word);

The results are the "exists = true" and "index = -1"

How can it be?

like image 236
gil kr Avatar asked Sep 11 '13 06:09

gil kr


People also ask

How do you determine a string is English or Arabic?

You can usually tell by the code points within the string itself. Arabic occupies certain blocks in the Unicode code space. It's a fairly safe bet that, if a substantial proportion of the characters exist in those blocks (such as بلدي الحوامات مليء الثعابينة ), it's Arabic text.

Which is faster indexOf or contains?

NET 4.0 - IndexOf no longer uses Ordinal Comparison and so Contains can be faster.

What does indexOf return if not found?

The indexOf() method returns the position of the first occurrence of a value in a string. The indexOf() method returns -1 if the value is not found.

How does indexOf work?

The indexOf() method returns the position of the first occurrence of specified character(s) in a string. Tip: Use the lastIndexOf method to return the position of the last occurrence of specified character(s) in a string.


1 Answers

Contains is culture-insensitive:

This method performs an ordinal (case-sensitive and culture-insensitive) comparison.

IndexOf is culture-sensitive:

This method performs a word (case-sensitive and culture-sensitive) search using the current culture.

That's the difference. If you use

int index = text.IndexOf(word, StringComparison.Ordinal);

then you'll get an index of 0 instead of -1 (so it's consistent with Contains).

There's no culture-sensitive overload of Contains; it's unclear to me whether you can use IndexOf reliably for this, but the CompareInfo class gives some more options. (I really don't know much about the details of cultural comparisons, particularly with RTL text. I just know it's complicated!)

like image 145
Jon Skeet Avatar answered Oct 15 '22 02:10

Jon Skeet