A Java char
is 2 bytes (max size of 65,536) but there are 95,221 Unicode characters. Does this mean that you can't handle certain Unicode characters in a Java application?
Does this boil down to what character encoding you are using?
encoding attribute, Java uses “UTF-8” character encoding by default. Character encoding basically interprets a sequence of bytes into a string of specific characters. The same combination of bytes can denote different characters in different character encoding.
Java uses UTF-16. A single Java char can only represent characters from the basic multilingual plane. Other characters have to be represented by a surrogate pair of two char s. This is reflected by API methods such as String.
Unicode sequences can be used everywhere in Java code. As long as it contains Unicode characters, it can be used as an identifier. You may use Unicode to convey comments, ids, character content, and string literals, as well as other information. However, note that they are interpreted by the compiler early.
The Difference Between Unicode and UTF-8 Unicode is a character set. UTF-8 is encoding. Unicode is a list of characters with unique decimal numbers (code points).
You can handle them all if you're careful enough.
Java's char
is a UTF-16 code unit. For characters with code-point > 0xFFFF it will be encoded with 2 char
s (a surrogate pair).
See http://www.oracle.com/us/technologies/java/supplementary-142654.html for how to handle those characters in Java.
(BTW, in Unicode 5.2 there are 107,154 assigned characters out of 1,114,112 slots.)
Java uses UTF-16. A single Java char
can only represent characters from the basic multilingual plane. Other characters have to be represented by a surrogate pair of two char
s. This is reflected by API methods such as String.codePointAt()
.
And yes, this means that a lot of Java code will break in one way or another when used with characters outside the basic multilingual plane.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With