ä letter sorting problem in Java

Question

Hi have some piece of code :

Collator col = Collator.getInstance(Locale.GERMAN);

List< String> list = new ArrayList<String>();
list.add("ac");
list.add("äb");
list.add("aa");
list.add("bb");


Collections.sort(list,col);
System.out.println(list);

I would expect to get [aa, ac, äb, bb] output, instead I am getting : [aa, äb, ac, bb]

I have no idea what I am doing wrong ... thanks in advance for help.

Hi thanks all for answers.

Unfortunately requirements of the project say clearly that strings must be sorted in such order : [aa, ac, äb, bb] : so I tried to use this code :

 String europeanRules =
        ("< a,A ; \u00e0,\u00c0 ; \u00e1,\u00c1 ; \u00e2,\u00c2 ; \u00e3,\u00c3; \u00e4,\u00c4 ; \u00e5,\u00c5 ; \u00e6,\u00c6 "+
                "; \u0101,\u0100 ; \u0103,\u0102 ; \u0105,\u0104 " +       
         "< b,B < c,C ; \u00e7,\u00c7 ; \u0107,\u0106 ; \u0109,\u0108 ; \u010b,\u010a ; \u010d,\u010c " +
         "< d,D ; \u010f,\u010e ; \u0111,\u0110 " +
         "< e,E ; \u00e8,\u00c8 ; \u00e9,\u00c9 ; \u00ea,\u00ca ; \u00eb,\u00cb " +
             "; \u0113,\u0112 ; \u0115,\u0114 ; \u0116,\u0117 ; \u0119,\u0118 ; \u011b,\u011a " +
         "< f,F < g,G < h,H " +
         "< i,I ; \u00ec,\u00cc ; \u00ed,\u00cd ; \u00ee,\u00ce ; \u00ef,\u00cf " +
         "< j,J < k,K " +
         "< l,L ; \u013a,\u0139 ; \u013c,\u013b ; \u013e,\u013d ; \u0140,\u013f ; \u0142,\u0141 " +
         "< m,M < n,N ; \u00f1,\u00d1 ; \u0144,\u0143 ; \u0146,\u0145 ; \u0148,\u0147 " +
         "< o,O ; \u00f2,\u00d2 ; \u00f3,\u00d3 ; \u00f4,\u00d4 ; \u00f5,\u00d5 ; \u00f6,\u00d6 ; \u00f8,\u00d8 " +
             "; \u014d,\u014c ; \u014f,\u014e ; \u0151,\u0150 " +
         "< p,P < q,Q < r,R ; \u0155,\u0154 ; \u0157,\u0156 ; \u0159,\u0158 " +
         "< s,S ; \u015b,\u015a ; \u015d,\u015c ; \u015f,\u015e ; \u0161,\u0160 " +
         "< t,T ; \u0163,\u0162 ; \u0165,\u0164 ; \u0167,\u0166 " +
         "< u,U ; \u00f9,\u00d9 ; \u00fa,\u00da ; \u00fb,\u00db ; \u00fc,\u00dc ; \u0169,\u0168 ; \u016b,\u016a ; \u016d,\u016c " +
             "; \u016f,\u016e ; \u0171,\u0170 ; \u0173,\u0172 " +
         "< v,V < w,W ; \u0175,\u0174 " +
         "< x,X < y,Y ; \u00fd,\u00dd ; \u00ff ; \u0177,\u0176 ; \u0178 " +
         "< z,Z ; \u017a,\u0179 ; \u017c,\u017b ; \u017e,\u017d");      

    RuleBasedCollator col = null;
    try {
        col = new RuleBasedCollator(europeanRules);
    } catch (ParseException e) {
    }   
    col.setStrength(Collator.SECONDARY);
    col.setDecomposition(Collator.FULL_DECOMPOSITION);

    List< String> list = new ArrayList<String>();
    list.add("ac");
    list.add("äb");
    list.add("aa");
    list.add("bb");     
    Collections.sort(list,col);
    System.out.println(list);

00E4 is UTF-8 code for ä so as I understand it should work ok ? Or I am doing something wrong ... thanks in advance for help.

the.duckman · Accepted Answer

The order you get is correct, at least according to the Wikipedia entry for this subject (sorry in German, Google Translate might help you, although it corrupts the umlauts for me...)

MicSim · Answer

If you want your accented characters to always come after the normal ones, you can prepend an @ in your defined rule for the RuleBasedCollator.

The definitions of the rule elements is as follows:

[...]

Modifier: There are currently two modifiers that turn on special collation rules.

'@' : Turns on backwards sorting of accents (secondary differences), as in French.

'!' : Turns on Thai/Lao vowel-consonant swapping. If this rule is in force when a Thai vowel of the range \U0E40-\U0E44 precedes a Thai consonant of the range \U0E01-\U0E2E OR a Lao vowel of the range \U0EC0-\U0EC4 precedes a Lao consonant of the range \U0E81-\U0EAE then the vowel is placed after the consonant for collation purposes.

[...]

So your sample code would look like follows:

(I made the change only for the ä character, i.e. @\u00e4, @\u00c4)

 String europeanRules =
        ("< a,A ; \u00e0,\u00c0 ; \u00e1,\u00c1 ; \u00e2,\u00c2 ; \u00e3,\u00c3; @\u00e4,@\u00c4 ; \u00e5,\u00c5 ; \u00e6,\u00c6 "+
                "; \u0101,\u0100 ; \u0103,\u0102 ; \u0105,\u0104 " +       
         "< b,B < c,C ; \u00e7,\u00c7 ; \u0107,\u0106 ; \u0109,\u0108 ; \u010b,\u010a ; \u010d,\u010c " +
         "< d,D ; \u010f,\u010e ; \u0111,\u0110 " +
         "< e,E ; \u00e8,\u00c8 ; \u00e9,\u00c9 ; \u00ea,\u00ca ; \u00eb,\u00cb " +
             "; \u0113,\u0112 ; \u0115,\u0114 ; \u0116,\u0117 ; \u0119,\u0118 ; \u011b,\u011a " +
         "< f,F < g,G < h,H " +
         "< i,I ; \u00ec,\u00cc ; \u00ed,\u00cd ; \u00ee,\u00ce ; \u00ef,\u00cf " +
         "< j,J < k,K " +
         "< l,L ; \u013a,\u0139 ; \u013c,\u013b ; \u013e,\u013d ; \u0140,\u013f ; \u0142,\u0141 " +
         "< m,M < n,N ; \u00f1,\u00d1 ; \u0144,\u0143 ; \u0146,\u0145 ; \u0148,\u0147 " +
         "< o,O ; \u00f2,\u00d2 ; \u00f3,\u00d3 ; \u00f4,\u00d4 ; \u00f5,\u00d5 ; \u00f6,\u00d6 ; \u00f8,\u00d8 " +
             "; \u014d,\u014c ; \u014f,\u014e ; \u0151,\u0150 " +
         "< p,P < q,Q < r,R ; \u0155,\u0154 ; \u0157,\u0156 ; \u0159,\u0158 " +
         "< s,S ; \u015b,\u015a ; \u015d,\u015c ; \u015f,\u015e ; \u0161,\u0160 " +
         "< t,T ; \u0163,\u0162 ; \u0165,\u0164 ; \u0167,\u0166 " +
         "< u,U ; \u00f9,\u00d9 ; \u00fa,\u00da ; \u00fb,\u00db ; \u00fc,\u00dc ; \u0169,\u0168 ; \u016b,\u016a ; \u016d,\u016c " +
             "; \u016f,\u016e ; \u0171,\u0170 ; \u0173,\u0172 " +
         "< v,V < w,W ; \u0175,\u0174 " +
         "< x,X < y,Y ; \u00fd,\u00dd ; \u00ff ; \u0177,\u0176 ; \u0178 " +
         "< z,Z ; \u017a,\u0179 ; \u017c,\u017b ; \u017e,\u017d");      
    
    RuleBasedCollator col = null;
    try {
        col = new RuleBasedCollator(europeanRules);
    } catch (ParseException e) {
    }   
    col.setStrength(Collator.SECONDARY);
    col.setDecomposition(Collator.FULL_DECOMPOSITION);
    
    List< String> list = new ArrayList<String>();
    list.add("ac");
    list.add("äb");
    list.add("aa");
    list.add("af");
    list.add("bb");     
    Collections.sort(list,col);
    System.out.println(list);

The output is:

[aa, ac, af, äb, bb]

ä letter sorting problem in Java

Tags:

java

sorting

unicode

localization

Grzegorz

2 Answers

the.duckman

MicSim

Recent Activity

Donate For Us

ä letter sorting problem in Java

Tags:

java

sorting

unicode

localization

Grzegorz

2 Answers

the.duckman

MicSim

Related questions

Recent Activity

Donate For Us