For a Java program I'm writing, I have a particular need to sort strings lexicographically by Unicode code point. This is not the same as String.compareTo()
when you start dealing with values outside the Basic Multilingual Plane. String.compareTo()
compares strings lexicographically on 16-bit char
values. To see that this is not equivalent, note that U+FD00 ARABIC LIGATURE HAH WITH YEH ISOLATED FORM is less than U+1D11E MUSICAL SYMBOL G CLEF, but the Java String
object "\uFD00"
for the Arabic character compares greater than the surrogate pair "\uD834\uDD1E"
for the clef.
I can manually loop along the code points using String.codePointAt()
and Character.charCount()
and do the comparison myself if necessary. Is there an API function or other more "canonical" way of doing this?
The compareTo() method compares two strings lexicographically. The comparison is based on the Unicode value of each character in the strings. The method returns 0 if the string is equal to the other string.
Using String. equals() :In Java, string equals() method compares the two given strings based on the data/content of the string. If all the contents of both the strings are same then it returns true. If any character does not match, then it returns false.
Two strings are lexicographically equal if they are the same length and contain the same characters in the same positions.
Its called Collations. See https://docs.oracle.com/javase/tutorial/i18n/text/locale.html
Note that your database can sort your query results using collations too. See for example what mysql supports https://dev.mysql.com/doc/refman/5.0/en/charset-charsets.html
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With