Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Why comparing two equal persian word does not return 0?

We have two same letter 'ی' and 'ي' which the first came as main letter after windows seven.
Back to old XP we had the second one as main.
Now the inputs I get is determined as different if one client is on windows XP and the other on windows seven.
I have also tried to use Persian culture with no success.
Am I missing anything ?
EDIT : Had to change the words for better understanding.. now they look similar.

foreach (CompareOptions i in Enum.GetValues(new CompareOptions().GetType()).OfType<CompareOptions>()) 
    Console.WriteLine( string.Compare("محسنين", "محسنین", new CultureInfo("fa-ir"), i) + "\t : " + i );

Outputs :

-1       : None
-1       : IgnoreCase
-1       : IgnoreNonSpace
-1       : IgnoreSymbols
-1       : IgnoreKanaType
-1       : IgnoreWidth
1        : OrdinalIgnoreCase
-1       : StringSort
130      : Ordinal
like image 705
Mohsen Sarkar Avatar asked Feb 20 '13 15:02

Mohsen Sarkar


1 Answers

The two strings are not equal. The last letter differs.

About why IgnoreCase returns -1 but OrdinalIgnoreCase returns 1:

  • OrdinalIgnoreCase uses the invariant culture to convert the string to upper and afterwards performs a byte by byte comparison
  • IgnoreCase uses the specified culture to perform a case insensitive compare.

The difference is that IgnoreCase knows "more" about the differences in the letters of the specified language and will treat them possibly differently than the invariant culture, leading to a different outcome.
This is a different manifestation of what became known as "The Turkish İ Problem".

You can verify it yourself by using the InvariantCulture instead of the Persian one:

foreach (CompareOptions i in Enum.GetValues(new CompareOptions().GetType()).OfType<CompareOptions>()) 
    Console.WriteLine( string.Compare("محسنی", "محسني", CultureInfo.InvariantCulture, i) + "\t : " + i );

This will output 1 for both IgnoreCase and OrdinalIgnoreCase.

Regarding your edited question:
The two strings still differ. The following code outputs the values of the single characters in the strings.

foreach(var value in strings.Select(x => x.ToCharArray().Select(y => (int)y)))
    Console.WriteLine(value);

The result will look like this:

1605
1581
1587
1606
1610 // <-- "yeh": ي
1606

1605
1581
1587
1606
1740 // <-- "farsi yeh": ی
1606

As you can see, there is one character that differs, resulting in a comparison that treats those two strings as not equal.

like image 55
Daniel Hilgarth Avatar answered Sep 27 '22 02:09

Daniel Hilgarth