Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Java array sort UTF-8

I want to sort an ArrayList<String> but the problem is my native language characters - my alphabet is like this: a, ą, b, c, č, d, e, f ... z, ž. As you see z character is second from the end and ą is second in alphabet, so after I sort my array it is sorted incorrectly. All my native language characters are moved to the end of array. Example:

package lt;

import java.util.ArrayList;
import java.util.Collections;

public class test {
    public static void main(String[] args) {
        List<String> items = new ArrayList<>();
        items.add("bbc");
        items.add("ąbc");
        items.add("abc");
        items.add("zzz");

        System.out.println("Unsorted: ");
        for(String str : items) {
            System.out.println(str);
        }

        Collections.sort(items);
        System.out.println();

        System.out.println("Sorted: ");
        for(String str : items) {
            System.out.println(str);
        }
    }
}

Output:

Unsorted: 
bbc
ąbc
abc
zzz

Sorted: 
abc
bbc
zzz
ąbc

Should be:

Sorted:
abc
ąbc
bbc
zzz
like image 959
Minutis Avatar asked Feb 13 '12 13:02

Minutis


2 Answers

You should use Collator class.

For example

Locale lithuanian = new Locale("lt_LT");
Collator lithuanianCollator = Collator.getInstance(lithuanian);

And then sort the collection using this collator

Collections.sort(theList, lithuanianCollator);
like image 145
Vic Avatar answered Nov 06 '22 04:11

Vic


You can use Collator to do locale sensitive String comparisions.

like image 29
Kris Avatar answered Nov 06 '22 03:11

Kris