Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

String.compareIgnoreCase returns wrong result

I'm using Java 8.

I've been struggling for a few days to understand a bug related to string comparison. Have a look at this test. The two strings are different (the "i" is not the same one, and is not the upper/lower case version of the other).

I would expect this test to pass. The first asserts do succeed but the second ones fails (for some reason the compareIgnoreCase returns 0)

Any idea what is going on ?

Thanks

String str1 = "vırus";
String str2 = "virus";
Assert.assertNotEquals(0, str1.compareTo(str2));
Assert.assertNotEquals(0, str1.compareToIgnoreCase(str2));
like image 932
Nisalon Avatar asked May 24 '18 22:05

Nisalon


1 Answers

Javadoc of compareToIgnoreCase says:

Compares two strings lexicographically, ignoring case differences. This method returns an integer whose sign is that of calling compareTo with normalized versions of the strings where case differences have been eliminated by calling Character.toLowerCase(Character.toUpperCase(character)) on each character.

The ı character does not have a corresponding uppercase letter, so toUpperCase returns I and then toLowerCase returns i.

Similarly, the İ character does not have a corresponding lowercase letter, so toLowerCase returns i.

Which means that compareToIgnoreCase considers these 4 letters to be the same:

  • ı - 'LATIN SMALL LETTER DOTLESS I' (U+0131)
  • i - 'LATIN SMALL LETTER I' (U+0069)
  • I - 'LATIN CAPITAL LETTER I' (U+0049)
  • İ - 'LATIN CAPITAL LETTER I WITH DOT ABOVE' (U+0130)

The upper-/title-/lower-case conversions are defined by Unicode, and can be seen in the links above. The uppercase I even has a comment:

Turkish and Azerbaijani use U+0131 for lowercase

And the lowercase i has comment:

Turkish and Azerbaijani use U+0130 for uppercase

As mentioned in comment by shmosel:

It's because character comparison is locale-insensitive. In a Turkish locale, the uppercase of i is İ and the lowercase of I is ı.

like image 110
Andreas Avatar answered Sep 25 '22 04:09

Andreas