Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Linguistic sorting (German) with Java

Sorting a string with number is done differently from one language to another. For example, in English digits come before letters in an ascending sorting. But, in German, digits are ascendant sorted after letters.

I tried to sort strings using a Collator as follows:

private Collator collator = Collator.getInstance(Locale.GERMANY);
collator.compare(str1, str2)

But above comparison does not take into account digits after letters rule.

Does any one have an idea why Java is not taking this rule (digits after letter) into account for the time being I am using RuleBasedCollator as follows:

private final String sortOrder = "< a, A < b, B < c, C < d, D < e, E < f, F < g, G < h, H < i, I < j, J < k, K < l, L < m, M < n, N < o, O < p, P < q, Q < r, R < s, S < t, T < u, U < v, V < w, W < x, X < y, Y < z, Z < 0 < 1 < 2 < 3 < 4 < 5 < 6 < 7 < 8 < 9";

private Collator collator = new RuleBasedCollator(sortOrder);
like image 841
Amir Avatar asked Oct 08 '12 09:10

Amir


1 Answers

You can check/debug the source code to see why nothing changes:

Collator.getInstance(Locale.GERMANY);

Calls the following piece code:

public static synchronized
Collator getInstance(Locale desiredLocale)
{
    // Snipping some code here
    String colString = "";
    try {
        ResourceBundle resource = LocaleData.getCollationData(desiredLocale);

        colString = resource.getString("Rule");
    } catch (MissingResourceException e) {
        // Use default values
    }
    try
    {
        result = new RuleBasedCollator( CollationRules.DEFAULTRULES +
                                        colString,
                                        CANONICAL_DECOMPOSITION );
    }
// Snipping some more code here

Over here you can see that the specific rules (colString which is empty in your case anyway) are placed after the defaults (CollationRules.DEFAULTRULES).

And as you have discovered that defaults have the numerics placed first:

  // NUMERICS

    + "<0<1<2<3<4<5<6<7<8<9"
    + "<\u00bc<\u00bd<\u00be"   // 1/4,1/2,3/4 fractions

    // NON-IGNORABLES
    + "<a,A"
    + "<b,B"
    + "<c,C"
    + "<d,D"
like image 163
Jasper Avatar answered Sep 28 '22 03:09

Jasper