Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

java CollationKey sorting wrong

I have a problem in comparing strings.I want to compare two "éd" and "ef" french texts like this

Collator localeSpecificCollator = Collator.getInstance(Locale.FRANCE);
CollationKey a = localeSpecificCollator.getCollationKey("éd");
CollationKey b = localeSpecificCollator.getCollationKey("ef");
System.out.println(a.compareTo(b));

This will print -1, but in french alphabet e come before é. But when we compare only e and é like this

Collator localeSpecificCollator = Collator.getInstance(Locale.FRANCE);
CollationKey a = localeSpecificCollator.getCollationKey("é");
CollationKey b = localeSpecificCollator.getCollationKey("e");
System.out.println(a.compareTo(b));

result is 1. Can you tell we what is wrong in first part of code?

like image 570
Ashot Avatar asked Aug 10 '12 10:08

Ashot


People also ask

Which sorting algorithm does Java use in sort?

Which sorting algorithm does Java use in sort ()? Previously, Java’s Arrays.sort method used Quicksort for arrays of primitives and Merge sort for arrays of objects. In the latest versions of Java, Arrays.sort method and Collection.sort () uses Timsort.

How does sort () work for objects in a collection?

Collections.sort () works for objects Collections like ArrayList and LinkedList. elements of ArrayList in ascending order. */ // to descending order. elements of ArrayList in ascending order. */

How do you sort a list in Java?

Sorting in Java. There are two in-built methods to sort in Java. Arrays.Sort() works for arrays which can be of primitive data type also. // Arrays.sort(). Collections.sort() works for objects Collections like ArrayList and LinkedList. elements of ArrayList in ascending order.


1 Answers

This seems to be the expected behaviour and it also seems to be the correct way to sort alphabetically in French.

The Android javadoc gives a hint as to why it is behaving like that - I suppose the details of the implementation in android are similar, if not identical, to the the standard JDK:

A tertiary difference is ignored when there is a primary or secondary difference anywhere in the strings.

In other words, because your 2 strings are sortable by only looking at primary differences (excluding the accents) the collator does not check the other differences.

It seems to be compliant with the Unicode Collation Algorithm (UCA):

Accent differences are typically ignored, if the base letters differ.

And it also seems to be the correct way to sort alphabetically in French, according to the wikipedia article on "ordre alphabetique":

En première analyse, les caractères accentués, de même que les majuscules, ont le même rang alphabétique que le caractère fondamental
Si plusieurs mots ont le même rang alphabétique, on tâche de les distinguer entre eux grâce aux majuscules et aux accents (pour le e, on a l'ordre e, é, è, ê, ë)

In English: the order initially ignores accents and case - if 2 words can't be sorted that way, accents and case are then taken into account.

like image 152
assylias Avatar answered Oct 02 '22 08:10

assylias