Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Java source files - Is encoding still relevant once compiled?

Tags:

java

javac

I do know that loading a file in Java without specifying the encoding to use is platform dependant. But my question is about the text contained in the .java source files themselves : Is the encoding used for those files still relevant once compiled?

For example, if I have a test.java file on Windows which is Cp1252 encoded and contains :

private String encodingTest = "Bœuf fûmé";

If I compile it using -encoding Cp1252, what happens exactly to this text in the resulting .class? Does the encoding still matter? Or is the encoding standardized by Java when compiling?

Will the resulting .class be platform dependant? Can I have a different result if I output this text on Windows, Linux, Solaris? Can an encoding configuration on the server impact the rendering of this text in a way or another?

like image 222
electrotype Avatar asked May 24 '14 10:05

electrotype


1 Answers

The source code encoding is very relevant while compiling, as the OP says in his post. However after compiling, all literal text is stored as (modified-) UTF-8 encoded strings.

All string literals, class/method/field names and references to them are stored in the constant pool of the .class file in UTF-8 encoding:

From the JVM spec (for Java version 1.7):

4.4.7. The CONSTANT_Utf8_info Structure

The CONSTANT_Utf8_info structure is used to represent constant string values:

[...]

String content is encoded in modified UTF-8. Modified UTF-8 strings are encoded so that code point sequences that contain only non-null ASCII characters can be represented using only 1 byte per code point, but all code points in the Unicode codespace can be represented.

So once your source code is compiled, it is stored in a known character encoding (UTF-8) and you no longer need to specify the source file encoding.

In general, section 4.4 of the JVM specification explains how the constant pool works and that Strings, class/field/method names etc. are represented by a CONSTANT_Utf8_info structure.

like image 193
Erwin Bolwidt Avatar answered Nov 11 '22 14:11

Erwin Bolwidt