Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Unicode escape sequence for non-BMP plane character

In java,

Unicode characters can be represented using unicode escape sequence for UTF-16 encoding. Below is an example that represents BMP plane character

char ch = '\u00A5'; // '¥'

Can surrogate pairs be used for non-BMP plane characters?

char ch4 = '\uD800\uDC00'; //Invalid character constant

How do I represent non-BMP plane character using java syntax?

like image 807
overexchange Avatar asked Sep 12 '25 23:09

overexchange


1 Answers

You cannot do that with a single char constant, since a char is a UTF-16 code unit. You have to use a String constant, such as:

final String s = "\uXXXX\uYYYY";

where XXXX is the high surrogate and YYYY is the low surrogate.

Another solution is to use an int to store the code point; you can then use Character.toChars() to obtain a char[] out of it:

final int codePoint = 0x1f4ae; // for instance
final char[] toChars = Charater.toChars(codePoint);

Depending on what you use, you may also append code points directly (a StringBuilder has a method for that, for instance).

like image 76
fge Avatar answered Sep 15 '25 13:09

fge