Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Why UTF-8 is used in class file and UTF-16 in in runtime?

Tags:

java

encoding

Why .class is UTF-8, but runtime .class is UTF-16?

enter image description here

like image 774
Jerry_W Avatar asked Dec 30 '16 09:12

Jerry_W


People also ask

What is the advantage of using UTF-8 instead of UTF-16?

UTF-16 is, obviously, more efficient for A) characters for which UTF-16 requires fewer bytes to encode than does UTF-8. UTF-8 is, obviously, more efficient for B) characters for which UTF-8 requires fewer bytes to encode than does UTF-16.

Why do we use UTF-8 encoding?

Why use UTF-8? An HTML page can only be in one encoding. You cannot encode different parts of a document in different encodings. A Unicode-based encoding such as UTF-8 can support many languages and can accommodate pages and forms in any mixture of those languages.

Is UTF-8 and UTF-16 the same?

Both UTF-8 and UTF-16 are variable length encodings. However, in UTF-8 a character may occupy a minimum of 8 bits, while in UTF-16 character length starts with 16 bits. Main UTF-8 pros: Basic ASCII characters like digits, Latin characters with no accents, etc.

What is the use of UTF-8 in Java?

UTF-8 represents a variable-width character encoding that uses between one and four eight-bit bytes to represent all valid Unicode code points. A code point can represent single characters, but also have other meanings, such as for formatting.

What is the difference between UTF 8 and UTF 16?

In UTF-16, the encoded file size is nearly twice of UTF-8 while encoding ASCII characters. So, UTF-8 is more efficient as it requires less space. UTF-16 is not backward compatible with ASCII where UTF-8 is well compatible.

What is UTF-8 encoding in Java?

UTF stands for Unicode Transformation, which defines an algorithm to map every Unicode code point to a unique byte sequence. For example, for character A, which is Latin Capital A, Unicode code point is U+0041, UTF-8 encoded bytes are 41, UTF-16 encoding is 0041, and Java char literal is '\u0041'.

What is the difference between an ASCII and a UTF-8 file?

An ASCII encoded file is identical with a UTF-8 encoded file that uses only ASCII characters. As UTF-8 is a byte-oriented format unlike UTF-16, So, no needs of byte order established in the case of UTF-8. UTF-8 is better than UTF-16 in error recovery corrupting portion of the file by decoding the next uncorrupting bytes.

Is UTF-16 byte-oriented?

The UTF-16 is not byte-oriented, and also it is not compatible with ASCII characters. The UTF-16 is the oldest encoding standard in the field of the Unicode series. The various application of the UTF-16 is the use in Microsoft Windows, JavaScript, and Java programming internally.


1 Answers

Why .class is UTF-8

For classes written for a Western audience, which are usually mostly ASCII, this is the most compact encoding.

but runtime .class is UTF-16?

At runtime it's quicker to manipulate strings that use a fixed-width encoding (Why Java char uses UTF-16?), so UCS-2 was chosen. This is complicated by the change from UCS-2 to UTF-16 making this another variable-width encoding.

As noted in the comments of that question, JEP 254 allows for the runtime representation to change to something more space efficient (e.g., Latin-1).

like image 151
Joe Avatar answered Sep 30 '22 01:09

Joe