Is the Java char type guaranteed to be stored in any particular encoding?
Edit: I phrased this question incorrectly. What I meant to ask is are char literals guaranteed to use any particular encoding?
The native character encoding of the Java programming language is UTF-16. A charset in the Java platform therefore defines a mapping between sequences of sixteen-bit UTF-16 code units (that is, sequences of chars) and sequences of bytes.
The Java Char It's 16 bits in size - double that of a byte. Most of the time however, the actual data stored in the char data type doesn't take up more than 8 bits. The reason Java allows 16 bits is so that all characters in all human languages can be represented. This representation is in the Unicode format.
Java's . class files use UTF-8 internally to store string literals. Data input streams and data output streams also read and write strings in UTF-8.
Java supports a wide array of encodings and their conversions to each other. The class Charset defines a set of standard encodings which every implementation of Java platform is mandated to support. This includes US-ASCII, ISO-8859-1, UTF-8, and UTF-16 to name a few.
Originally, Java used UCS-2 internally; now it uses UTF-16. The two are virtually identical, except for D800 - DFFF, which are used in UTF-16 as part of the extended representation for larger characters.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With