Why .class
is UTF-8, but runtime .class
is UTF-16?
UTF-16 is, obviously, more efficient for A) characters for which UTF-16 requires fewer bytes to encode than does UTF-8. UTF-8 is, obviously, more efficient for B) characters for which UTF-8 requires fewer bytes to encode than does UTF-16.
Why use UTF-8? An HTML page can only be in one encoding. You cannot encode different parts of a document in different encodings. A Unicode-based encoding such as UTF-8 can support many languages and can accommodate pages and forms in any mixture of those languages.
Both UTF-8 and UTF-16 are variable length encodings. However, in UTF-8 a character may occupy a minimum of 8 bits, while in UTF-16 character length starts with 16 bits. Main UTF-8 pros: Basic ASCII characters like digits, Latin characters with no accents, etc.
UTF-8 represents a variable-width character encoding that uses between one and four eight-bit bytes to represent all valid Unicode code points. A code point can represent single characters, but also have other meanings, such as for formatting.
In UTF-16, the encoded file size is nearly twice of UTF-8 while encoding ASCII characters. So, UTF-8 is more efficient as it requires less space. UTF-16 is not backward compatible with ASCII where UTF-8 is well compatible.
UTF stands for Unicode Transformation, which defines an algorithm to map every Unicode code point to a unique byte sequence. For example, for character A, which is Latin Capital A, Unicode code point is U+0041, UTF-8 encoded bytes are 41, UTF-16 encoding is 0041, and Java char literal is '\u0041'.
An ASCII encoded file is identical with a UTF-8 encoded file that uses only ASCII characters. As UTF-8 is a byte-oriented format unlike UTF-16, So, no needs of byte order established in the case of UTF-8. UTF-8 is better than UTF-16 in error recovery corrupting portion of the file by decoding the next uncorrupting bytes.
The UTF-16 is not byte-oriented, and also it is not compatible with ASCII characters. The UTF-16 is the oldest encoding standard in the field of the Unicode series. The various application of the UTF-16 is the use in Microsoft Windows, JavaScript, and Java programming internally.
Why .class is UTF-8
For classes written for a Western audience, which are usually mostly ASCII, this is the most compact encoding.
but runtime .class is UTF-16?
At runtime it's quicker to manipulate strings that use a fixed-width encoding (Why Java char uses UTF-16?), so UCS-2 was chosen. This is complicated by the change from UCS-2 to UTF-16 making this another variable-width encoding.
As noted in the comments of that question, JEP 254 allows for the runtime representation to change to something more space efficient (e.g., Latin-1).
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With