Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to set up collators strength and decomposition for sorting unicode string by first letter

I have a list of unicode strings that I want to sort by first letter. The problem is that I don't know to set up java.text.Collator that it would treat similar letters as different.

This is what I get now:

  • Rokiškis
  • Šakiai
  • Salantai
  • Šeduva
  • Šiauliai
  • Šilalė
  • Skuodas
  • Tauragė
  • Telšiai

This is what I want to get (word beginning with Š should always go after S not looking to second letter):

  • Rokiškis
  • Salantai
  • Skuodas
  • Šakiai
  • Šeduva
  • Šiauliai
  • Šilalė
  • Tauragė
  • Telšiai
like image 805
Rytis Alekna Avatar asked Nov 12 '22 19:11

Rytis Alekna


1 Answers

We can create a class extends Collator and override the compare method there.

An example is here.

public class MyCollator extends Collator {

@Override
public int compare(String source, String target) {
    return source.compareTo(target);
}

@Override
public CollationKey getCollationKey(String source) {
    // TODO Auto-generated method stub
    return null;
}

@Override
public int hashCode() {
    // TODO Auto-generated method stub
    return 0;
}

}

Then we can use this newly added class to sort the String list, and it will display in a correct way.

Collator collator = new MyCollator();

Collections.sort(list, collator);

My Test Result is as follows:

  • Rokiškis
  • Salantai
  • Skuodas
  • Tauragė
  • Telšiai
  • Šakiai
  • Šeduva
  • Šiauliai
  • Šilalė

Note, in the result, Š is displaying after T, this is because "Š".compareTo("T")>1 is equal to true.

I believe you can put some logic in compare method to make Š displaying just after S, but before T.

The above code is complied and executed using JDK 1.5 version.

Use Collections.sort(list) directly; You will get the same result as I mentioned above.

like image 151
MouseLearnJava Avatar answered Nov 15 '22 05:11

MouseLearnJava