I'd like to write unicode literal U+10428 in Java. http://www.marathon-studios.com/unicode/U10428/Deseret_Small_Letter_Long_I
I tried with '\u10428' and it doesn't compile.
Unicode character literals To print Unicode characters, enter the escape sequence “u”. Unicode sequences can be used everywhere in Java code. As long as it contains Unicode characters, it can be used as an identifier.
A Unicode literal is a sequence of ASCII characters intermixed with escaped sequence of hex digits, all enclosed in quotes and preceded by U&.
Unicode is an international standard of character encoding which has the capability of representing a majority of written languages all over the globe. Unicode uses hexadecimal to represent a character. Unicode is a 16-bit character encoding system. The lowest value is \u0000 and the highest value is \uFFFF.
We can determine the unicode category for a particular character by using the getType() method. It is a static method of Character class and it returns an integer value of char ch representing in unicode general category.
Because Java went full-out unicode when people thought 64K are enough for everyone (Where did one hear such before?), they started out with UCS-2 and later upgraded to UTF-16.
But they never bothered to add an escape sequence for unicode characters outside the BMP.
Thus, your only recourse is manually recoding to a UTF-16 surrogate-pair and using two UTF-16 escapes.
Your example codepoint U+10428
is "\uD801\uDC28"
.
I used this site for the recoding: https://rishida.net/tools/conversion/
Quote from the docs:
3.10.5 String Literals
A string literal consists of zero or more characters enclosed in double quotes. Characters may be represented by escape sequences (§3.10.6) - one escape sequence for characters in the range U+0000 to U+FFFF, two escape sequences for the UTF-16 surrogate code units of characters in the range U+010000 to U+10FFFF.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With