Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

What exactly is sun.jnu.encoding?

Tags:

java

encoding

see http://happygiraffe.net/blog/2009/09/24/java-platform-encoding/

This link describes the usage of the sun.jnu.encoding property for using the correct encoding when parsing values passed via the commandline, something that setting the file.encoding property doesn't influence.


I did some investigation on encoding and as per my analysis

  1. sun.jnu.encoding affects the creation of file name (this possibly gets set by LANG on unix before starting java)
  2. file.encoding affects the content of a file

I believe that this value represents the system encoding, which may be different from the user encoding ("file.encoding") on some platforms. The "sun" prefix makes me suspect that this is an implementation detail specific to the Sun JRE (a quick look at an IBM 1.4 VM shows an "ibm.system.encoding" system property). I have no idea on how this might be used internally - though I'm sure a quick look through the source would yield some clues.


I tried to centralize all the information provided in the answers here and elsewhere on the Web, in order to make the most complete answer possible.

As other comments have noted, there are actually two properties that affect the chosen encoding on the JVM:

  • sun.jnu.encoding, also known as the "platform encoding" or "JNU encoding", is an undocumented, internal property that holds the name of the encoding to use for interacting with the platform (e.g. file paths and JNI C String to Java String conversions — maybe also command-line arguments, main classes and environment variables, but I wasn't able to verify this claim).

    On MacOS, this is always UTF-8, on Linux it's always the same as file.encoding (unless file.encoding is overriden, in which case I do not know what happens), and on Windows it can vary.

  • file.encoding, also known as the "default charset" or "user encoding", is mainly used to determine the charset for encoding/decoding file contents. This is the charset that java.nio.charsets.Charset.defaultCharset() returns. Note that the value in file.encoding is used by many JDK APIs as the default encoding, but can be overridden by providing an explicit Charset or a charset name in the call to the JDK method.

These properties are determined dynamically when the JVM starts (though this is not the case for GraalVM Native Image, which sets them at build time as of this writing).

Finally, as this draft JEP states:

The value of these system properties can be overridden on the command line although doing so has never been supported.