I have a list of words in Arabic that I'd like to sort. I have tried the standard Collator with different Locales (like English or French but without much hope) and I have even created my own RuleBasedCollator but to no avail. Apparently the default sorting relies on the unicode values order, which in many cases works but apparently not in this one.
Following the instructions of the javadocs, the RuleBasedCollator requires a string specifying the characters in the order you want them sorted. I created the following string taking the unicode codes from this table:
String arabicLetters = "< \u0623=\uFE83=\uFE84 < \u0628=\uFE8F=\uFE90=\uFE92=\uFE91 < \u062A=\uFE95=\uFE96=\uFE98=\uFE97 < \u062B=\uFE99=\uFE9A=\uFE9C=\uFE9B < \u062C=\uFE9D=\uFE9E=\uFEA0=\uFE9F < \u062D=\uFEA1=\uFEA2=\uFEA4=\uFEA3 < \u062E=\uFEA5=\uFEA6=\uFEA8=\uFEA7 < \u062F=\uFEA9=\uFEAA < \u0630=\uFEAB=\uFEAC < \u0631=\uFEAD=\uFEAE < \u0632=\uFEAF=\uFEB0 < \u0633=\uFEB1=\uFEB2=\uFEB4=\uFEB3 < \u0634=\uFEB5=\uFEB6=\uFEB8=\uFEB7 < \u0635=\uFEB9=\uFEBA=\uFEBC=\uFEBB < \u0636=\uFEBD=\uFEBE=\uFEC0=\uFEBF < \u0637=\uFEC1=\uFEC2=\uFEC4=\uFEC3 < \u0638=\uFEC5=\uFEC6=\uFEC8=\uFEC7 < \u0639=\uFEC9=\uFECA=\uFECC=\uFECB < \u063A=\uFECD=\uFECE=\uFED0=\uFECF < \u0641=\uFED1=\uFED2=\uFED4=\uFED3 < \u0642=\uFED5=\uFED6=\uFED8=\uFED7 < \u0643=\uFED9=\uFEDA=\uFEDC=\uFEDB < \u0644=\uFEDD=\uFEDE=\uFED0=\uFEDF < \u0645=\uFEE1=\uFEE2=\uFEE4=\uFEE3 < \u0646=\uFEE5=\uFEE6=\uFEE8=\uFEE7 < \u0647=\uFEE9=\uFEEA=\uFEEC=\uFEEB < \u0648=\uFEED=\uFEEE < \u064A=\uFEF1=\uFEF2=\uFEF4=\uFEF3 < \u0622=\uFE81=\uFE82 < \u0629=\uFE93=\uFE94 < \u0649=\uFEEF=\uFEF0 < \u0627";
The Arabic letters can take four forms depending on the position where they are in a word. Therefore what I did in the rules string above is making equal all 4 forms of each letter. Then I indicate the order of the letters separating them with '<'. I imagine that this is the correct way to do it.
Now, if I have a collection with the days of the week (sorted in this case by day of the week, not 'alphabetically'):
الأَحَد, الاِثنَين, الثُّلاثاء, الأَربِعاء, الخَميس, الجُمعة,السَّبت
The results I am getting are not sorted at all:
الأَحَد, الخَميس, الاِثنَين, الثُّلاثاء, الأَربِعاء, السَّبت, الجُمعة
Besides, for such a small amount of words it takes a considerable amount of time which makes it unusable.
Does anybody know if I'm doing something wrong or if there is a life-saving library that already handles this?
I did some googling before writing this and I'm surprised I didn't find a single result.
Thanks!
UPDATE with code:
public static class TranslatableComparator implements java.util.Comparator<Translatable> {
@Override
public int compare(Translatable t1, Translatable t2) {
String sortingRules = "< \u0623=\uFE83=\uFE84 < \u0628=\uFE8F=\uFE90=\uFE92=\uFE91 < \u062A=\uFE95=\uFE96=\uFE98=\uFE97 < \u062B=\uFE99=\uFE9A=\uFE9C=\uFE9B < \u062C=\uFE9D=\uFE9E=\uFEA0=\uFE9F < \u062D=\uFEA1=\uFEA2=\uFEA4=\uFEA3 < \u062E=\uFEA5=\uFEA6=\uFEA8=\uFEA7 < \u062F=\uFEA9=\uFEAA < \u0630=\uFEAB=\uFEAC < \u0631=\uFEAD=\uFEAE < \u0632=\uFEAF=\uFEB0 < \u0633=\uFEB1=\uFEB2=\uFEB4=\uFEB3 < \u0634=\uFEB5=\uFEB6=\uFEB8=\uFEB7 < \u0635=\uFEB9=\uFEBA=\uFEBC=\uFEBB < \u0636=\uFEBD=\uFEBE=\uFEC0=\uFEBF < \u0637=\uFEC1=\uFEC2=\uFEC4=\uFEC3 < \u0638=\uFEC5=\uFEC6=\uFEC8=\uFEC7 < \u0639=\uFEC9=\uFECA=\uFECC=\uFECB < \u063A=\uFECD=\uFECE=\uFED0=\uFECF < \u0641=\uFED1=\uFED2=\uFED4=\uFED3 < \u0642=\uFED5=\uFED6=\uFED8=\uFED7 < \u0643=\uFED9=\uFEDA=\uFEDC=\uFEDB < \u0644=\uFEDD=\uFEDE=\uFED0=\uFEDF < \u0645=\uFEE1=\uFEE2=\uFEE4=\uFEE3 < \u0646=\uFEE5=\uFEE6=\uFEE8=\uFEE7 < \u0647=\uFEE9=\uFEEA=\uFEEC=\uFEEB < \u0648=\uFEED=\uFEEE < \u064A=\uFEF1=\uFEF2=\uFEF4=\uFEF3 < \u0622=\uFE81=\uFE82 < \u0629=\uFE93=\uFE94 < \u0649=\uFEEF=\uFEF0 < \u0627";
RuleBasedCollator col = null;
try {
col = new RuleBasedCollator(sortingRules);
} catch (ParseException e) {
//col = (RuleBasedCollator)RuleBasedCollator.getInstance(Locale.FRENCH);
}
return col.getCollationKey(t1.getTranslation().getText()).compareTo(col.getCollationKey(t2.getTranslation().getText()));
}
}
How to Sort a String in Java alphabetically in Java? The toCharArray () method of this class converts the String to a character array and returns it. To sort a string value alphabetically − Get the required string. Convert the given string to a character array using the toCharArray () method.
Convert the sorted array to String by passing it to the constructor of the String array. To sort the array manually − Get the required string. Convert the given string to a character array using the toCharArray () method.
The Arabic letters can take four forms depending on the position where they are in a word. Therefore what I did in the rules string above is making equal all 4 forms of each letter. Then I indicate the order of the letters separating them with '<'. I imagine that this is the correct way to do it.
A class named Demo contains a function named ‘sort_elements’. This function iterates through a String and checks the length of every word in the string and arranges them based on their length. In the main function, aString array is defined and its length is assigned to a variable.
You don't need to define your own collator, just use the built-in one for Arabic. Your Comparator
then looks like this
public int compare(Translatable t1, Translatable t2) {
Collator.getInstance(new Locale("ar")).compare(t1.getTranslation().getText(), t2.getTranslation().getText());
}
(You can check if a collator is available for Arabic by browsing the results from Collator.getAvailableLocales()
.)
As noted in the comments, if you're worried about performance you should calculate the collation keys, store them in your Translatable
objects and sort on them instead.
If you really want to see where the differences are between what you defined and the standard collator, just print out the rules:
System.out.println((RuleBasedCollator) Collator.getInstance(new Locale("ar"))).getRules();
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With