I have a problem in comparing strings.I want to compare two "éd" and "ef" french texts like this
Collator localeSpecificCollator = Collator.getInstance(Locale.FRANCE);
CollationKey a = localeSpecificCollator.getCollationKey("éd");
CollationKey b = localeSpecificCollator.getCollationKey("ef");
System.out.println(a.compareTo(b));
This will print -1
, but in french alphabet e
come before é
. But when we compare only e
and é
like this
Collator localeSpecificCollator = Collator.getInstance(Locale.FRANCE);
CollationKey a = localeSpecificCollator.getCollationKey("é");
CollationKey b = localeSpecificCollator.getCollationKey("e");
System.out.println(a.compareTo(b));
result is 1
. Can you tell we what is wrong in first part of code?
Which sorting algorithm does Java use in sort ()? Previously, Java’s Arrays.sort method used Quicksort for arrays of primitives and Merge sort for arrays of objects. In the latest versions of Java, Arrays.sort method and Collection.sort () uses Timsort.
Collections.sort () works for objects Collections like ArrayList and LinkedList. elements of ArrayList in ascending order. */ // to descending order. elements of ArrayList in ascending order. */
Sorting in Java. There are two in-built methods to sort in Java. Arrays.Sort() works for arrays which can be of primitive data type also. // Arrays.sort(). Collections.sort() works for objects Collections like ArrayList and LinkedList. elements of ArrayList in ascending order.
This seems to be the expected behaviour and it also seems to be the correct way to sort alphabetically in French.
The Android javadoc gives a hint as to why it is behaving like that - I suppose the details of the implementation in android are similar, if not identical, to the the standard JDK:
A tertiary difference is ignored when there is a primary or secondary difference anywhere in the strings.
In other words, because your 2 strings are sortable by only looking at primary differences (excluding the accents) the collator does not check the other differences.
It seems to be compliant with the Unicode Collation Algorithm (UCA):
Accent differences are typically ignored, if the base letters differ.
And it also seems to be the correct way to sort alphabetically in French, according to the wikipedia article on "ordre alphabetique":
En première analyse, les caractères accentués, de même que les majuscules, ont le même rang alphabétique que le caractère fondamental
Si plusieurs mots ont le même rang alphabétique, on tâche de les distinguer entre eux grâce aux majuscules et aux accents (pour le e, on a l'ordre e, é, è, ê, ë)
In English: the order initially ignores accents and case - if 2 words can't be sorted that way, accents and case are then taken into account.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With