I have a collection of strings and need to sort it. I'm using the Collator. But the output is weird.
final Collator collator = Collator.getInstance(Locale.US);
List<String> data = new ArrayList<String>();
data.add("1Z5800701_AB");
data.add("1Z5800701_AC");
data.add("1Z5800701-A");
data.add("1Z5800701 A");
data.add("1Z5800701B");
data.add("1Z5800701A");
data.add("1Z5800701 - A");
Collections.sort(data, new Comparator<String>() {
@Override
public int compare(String o1, String o2) {
return collator.compare(o1, o2);
}
});
for (String s : data) {
System.out.println(s);
}
And the output is:
1Z5800701_AB
1Z5800701_AC
1Z5800701A
1Z5800701 A
1Z5800701 - A
1Z5800701-A
1Z5800701B
The last one string '1Z5800701B' should be after '1Z5800701A'. What am I missing here?
It's a matter of the locale used, you can reproduce the same behavior in the bash shell with LC_ALL=en_US sort
. The point is that the "word separators" are treated differently from "word characters" in this locale (i.e. you can't always say that character X sorts before or after character B - it depends on context). The result is if you have 1Z5800701 <optional separators> A
, it sorts before 1Z5800701 <optional separators> B
, that's why 1Z5800701B
comes after all combinations where the A
comes after the digits, optionally separated by "separators". You can also see some more examples of "not obvious" orderings in this Wikipedia articles
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With