Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Custom sorting list of strings (following Chamorro language collation rules)

I am trying to sort a list of strings for a Pacific island language (Chamorro). In this language, Ng is considered one letter, and it comes after N in the alphabet. How can I sort a list of words such that Nai and Nunu both come before words that begin with Ng?

Update

The complete alphabet is:

A, Å, B, Ch, D, E, F, G, H, I, K, L, M, N, Ñ, Ng, O, P, R, S, T, U, Y

Besides Å, Ñ, and their lowercase versions, there are no accents over the other letters. Words can have apostrophes in them (such as o'mak), but they don't affect the sort order.

There is no locale for Chamorro, so I need to manually implement a sort algorithm.

like image 689
BJ Dela Cruz Avatar asked May 27 '14 06:05

BJ Dela Cruz


1 Answers

Thanks to Dirk Lachowski, I implemented a solution that works. Here's what I wrote:

  static final String CHAMORRO_RULES = ("< a,A < å,Å < b,B < ch,Ch < d,D < e,E < f,F < g,G < h,H < i,I < k,K < l,L "
      + "< m,M < n,N < ñ,Ñ < ng,Ng < o,O < p,P < r,R < s,S < t,T < u,U < y,Y");
  static final RuleBasedCollator CHAMORRO_COLLATOR;
  static {
    try {
      CHAMORRO_COLLATOR = new RuleBasedCollator(CHAMORRO_RULES);
    }
    catch (ParseException pe) {
      throw new RuntimeException(pe);
    }
  }

After I implemented the rule-based collator above, I simply wrote the following sort method:

  static void sort(List<String> words) {
    Collections.sort(words, new Comparator<String>() {

      @Override
      public int compare(String lhs, String rhs) {
        return Constants.CHAMORRO_COLLATOR.compare(lhs, rhs);
      }

    });
  }
like image 119
BJ Dela Cruz Avatar answered Oct 23 '22 19:10

BJ Dela Cruz