What is the preferred way to compare two Java Strings lexicographically on Unicode code points?

Tags:

For a Java program I'm writing, I have a particular need to sort strings lexicographically by Unicode code point. This is not the same as String.compareTo() when you start dealing with values outside the Basic Multilingual Plane. String.compareTo() compares strings lexicographically on 16-bit char values. To see that this is not equivalent, note that U+FD00 ARABIC LIGATURE HAH WITH YEH ISOLATED FORM is less than U+1D11E MUSICAL SYMBOL G CLEF, but the Java String object "\uFD00" for the Arabic character compares greater than the surrogate pair "\uD834\uDD1E" for the clef.

I can manually loop along the code points using String.codePointAt() and Character.charCount() and do the comparison myself if necessary. Is there an API function or other more "canonical" way of doing this?

931

asked Dec 09 '14 17:12

Aaron Rotenberg

1 Answers

Its called Collations. See https://docs.oracle.com/javase/tutorial/i18n/text/locale.html

Note that your database can sort your query results using collations too. See for example what mysql supports https://dev.mysql.com/doc/refman/5.0/en/charset-charsets.html

answered Oct 07 '22 15:10

jorgeu

Related questions
                            
                                How do I sign a Java applet using a certificate in my Mac keychain?
                            
                                Why can't I shutdown my own ExecutorService under a SecurityManager?
                            
                                Sun JVM (JRE jre1.6.0_24) segfault NET_Read
                            
                                Any current workarounds to use Sonar for Java 7 code?
                            
                                Configuring Java GAE Appstats for cron job
                            
                                Subclassing DefaultRowSorter to allow tree-table sorting
                            
                                My dll code works from an exe file, but fails to load from Java loadLibrary
                            
                                unable to implement descriptors in android
                            
                                How to find an overloaded method in Java?
                            
                                Getting UnknownLengthHttpInputStream while getting InputStream from HttpURLConnection in android
                            
                                A minimal Stripped Down JRE for Windows
                            
                                Spring data internationalization best practice
                            
                                Obfuscating Play 2 web app on dist with Proguard?
                            
                                OCR algorithm improvement
                            
                                How to configure MappingJackson2HttpMessageConverter registered by spring-hateoas
                            
                                Hibernate @MapKeyColumn and table inheritance causing Unknown column type exception
                            
                                Failed to determine Hibernate PersistenceProvider

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

What is the preferred way to compare two Java Strings lexicographically on Unicode code points?

Tags:

java

string

unicode

Aaron Rotenberg

People also ask

1 Answers

jorgeu

Recent Activity

Donate For Us

What is the preferred way to compare two Java Strings lexicographically on *Unicode code points*?

Tags:

java

string

unicode

Aaron Rotenberg

People also ask

1 Answers

jorgeu

Related questions

Recent Activity

Donate For Us

What is the preferred way to compare two Java Strings lexicographically on Unicode code points?