I do know that loading a file in Java without specifying the encoding to use is platform dependant. But my question is about the text contained in the .java source files themselves : Is the encoding used for those files still relevant once compiled?
For example, if I have a test.java
file on Windows which is Cp1252
encoded and contains :
private String encodingTest = "Bœuf fûmé";
If I compile it using -encoding Cp1252
, what happens exactly to this text in the resulting .class
? Does the encoding still matter? Or is the encoding standardized by Java when compiling?
Will the resulting .class
be platform dependant? Can I have a different result if I output this text on Windows, Linux, Solaris? Can an encoding configuration on the server impact the rendering of this text in a way or another?
The source code encoding is very relevant while compiling, as the OP says in his post. However after compiling, all literal text is stored as (modified-) UTF-8 encoded strings.
All string literals, class/method/field names and references to them are stored in the constant pool of the .class
file in UTF-8 encoding:
From the JVM spec (for Java version 1.7):
4.4.7. The CONSTANT_Utf8_info Structure
The CONSTANT_Utf8_info structure is used to represent constant string values:
[...]
String content is encoded in modified UTF-8. Modified UTF-8 strings are encoded so that code point sequences that contain only non-null ASCII characters can be represented using only 1 byte per code point, but all code points in the Unicode codespace can be represented.
So once your source code is compiled, it is stored in a known character encoding (UTF-8) and you no longer need to specify the source file encoding.
In general, section 4.4 of the JVM specification explains how the constant pool works and that Strings, class/field/method names etc. are represented by a CONSTANT_Utf8_info
structure.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With