I'm trying to sort a List of objects by String field "country". Each country is in it's native language
What I want to do is to get "България" for instance, to appear after "A*" countries, as letter 'Б' corresponds to latin 'B'. I'm trying to use default Collater but non-latin names still end up last in list.
Here's my code so far:
private static final Comparator<DomainTO> DOMAIN_COUNTRY_COMPARATOR =
new Comparator<DomainTO>() {
@Override
public int compare(DomainTO t, DomainTO t1) {
Collator defaultCollator = Collator.getInstance();
return defaultCollator.compare(t.getCountry(), t1.getCountry());
}
};
How to sort words from different languages? There are many alphabets (English, Russian, German etc).
Everyone has ordered list of letters. It is easy to sort words coming from one alphabet. But is it possible to merge all these alphabets into one?
I think it is not possible to do it in a way that could be accepted by everyone. As an example take English and Russian alphabets.
Russian letters can be casted to English letters (at least most of them) but after this casting they would change the order.
This would be favoring one alphabet over another. Why not to cast English letters to Russian?
Another issue is that there are special letters. In German there is Ö between O and P and in Polish there is Ó in this place.
So we have following relations:
O < Ö < P
O < Ó < P
But what is the relation between Ö and Ó? If there was a country Ósterreich should it be befor or after Österreich? So there is impossible to define universal rules of sorting words from different languages.
All we can do is casting all alphabets to the chosen one. And this is what OP is trying to do.
The chosen one is Latin alphabet and other alphabets have to be casted to this one.
The problem is that this casting is often ambiguous. Easily we can only cast most of Russian or Greek letters.
Much bigger problem is with Arabic or Asian languages. And we should remember that when casting from one alphabet to another we often lose something.
So how can we do such sorting?
Code:
char [] russian = "АаБбВвГгДдЕеЁёЖжЗзИиЙйКкЛлМмНнОоПпРрСсТтУуФфХхЦцЧчШшЩщ".toCharArray();
char [] russian_to = "AaBbWwGgDdEeEeZzZzIiJjKkLlMmNnOoPpRrSsTtUuFfHhCcCcSsss".toCharArray();
for (int i = 0; i < russian.length; i++) {
input = input.replace(russian[i], russian_to[i]);
}
This way we converted all letters from Russian alphabet. Now we have to add similar code for other alphabets. And Russian was the simplest one.
But assume that we succeeded and we managed to do such sorting of words from all languages of the world.
But what are the consequences of making such sorting? Before we answer this question lets ask what were the intentions of doing this.
OP didn't say his reasons of doing such sorting. But we can deduce it:
So let's answer the question: Is this sorting makes it easier to find specific country to man who only knows his native language?
Summary:
Sorting country names written in different languages is difficult to define and implement. And when implemented it would be either not-helpful or harmful.
Perhaps you can compare the normalized Strings. Something like this:
private static final Comparator<DomainTO> DOMAIN_COUNTRY_COMPARATOR =
new Comparator<DomainTO>() {
private String normalize(final String input) {
return Normalizer
.normalize(input, Normalizer.Form.NFD)
.replaceAll("[^\\p{ASCII}]", "");
}
@Override
public int compare(final DomainTO t, final DomainTO t1) {
return normalize(t.getCountry()).compareTo(
normalize(t1.getCountry()));
}
};
See related question about normalizing: Converting Java String to ascii (this question is linked to several similar questions)
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With