Is UTF-8 the default encoding in Java?
If not, how can I know which encoding is used by default?
The native character encoding of the Java programming language is UTF-16. A charset in the Java platform therefore defines a mapping between sequences of sixteen-bit UTF-16 code units (that is, sequences of chars) and sequences of bytes.
encoding attribute, Java uses “UTF-8” character encoding by default. Character encoding basically interprets a sequence of bytes into a string of specific characters. The same combination of bytes can denote different characters in different character encoding.
Linux represents Unicode using the 8-bit Unicode Transformation Format (UTF-8). UTF-8 is a variable length encoding of Unicode. It uses 1 byte to code 7 bits, 2 bytes for 11 bits, 3 bytes for 16 bits, 4 bytes for 21 bits, 5 bytes for 26 bits, 6 bytes for 31 bits.
The default character set of the JVM is that of the system it's running on. There's no specific value for this and you shouldn't generally depend on the default encoding being any particular value.
It can be accessed at runtime via Charset.defaultCharset()
, if that's any use to you, though really you should make a point of always specifying encoding explicitly when you can do so.
Note that you can change the default encoding of the JVM using the confusingly-named property file.encoding
.
If your application is particularly sensitive to encodings (perhaps through usage of APIs implying default encodings), then you should explicitly set this on JVM startup to a consistent (known) value.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With