from java.lang.StringCoding :
String csn = (charsetName == null) ? "ISO-8859-1" : charsetName;
This is what is used from Java.lang.getBytes() , in linux jdk 7 I was always under the impression that UTF-8 is the default charset ?
Thanks
UTF-8 is a multibyte encoding that can represent any Unicode character. ISO 8859-1 is a single-byte encoding that can represent the first 256 Unicode characters. Both encode ASCII exactly the same way.
getBytes() This function takes no arguments and used the default charset to encode the string into bytes. getbytes() function in java is used to convert a string into a sequence of bytes and returns an array of bytes.
The method getBytes() encodes a String into a byte array using the platform's default charset if no argument is passed. We can pass a specific Charset to be used in the encoding process, either as a String object or a String object.
UTF stands for Unicode Transformation Format. The '8' signifies that it allocates 8-bit blocks to denote a character. The number of blocks needed to represent a character varies from 1 to 4. In order to convert a String into UTF-8, we use the getBytes() method in Java.
Java tries to use the default character encoding to return bytes using String.getBytes().
.... Here is the tricky part (which is probably never going to come into play) ....
If the system cannot decode or encode strings using the default charset (UTF-8 or another one), then there will be a fallback to ISO-8859-1. If the fallback does not work ... the system will fail!
.... Really ... (gasp!) ... Could it crash if my specified charset cannot be used, and UTF-8 or ISO-8859-1 are also unusable?
Yes. The Java source comments state in the StringCoding.encode(...) method:
// If we can not find ISO-8859-1 (a required encoding) then things are seriously wrong with the installation.
... and then it calls System.exit(1)
It is possible, although not probable, that the users JVM may not support decoding and encoding in UTF-8 or the charset specified on JVM startup.
Then, is the default charset used properly in the String class during getBytes()?
No. However, the better question is ...
The contract as defined in the Javadoc is correct.
The behavior of this method when this string cannot be encoded in the default charset is unspecified. The
CharsetEncoder
class should be used when more control over the encoding process is required.
It is always advised to explicitly specify "ISO-8859-1" or "US-ASCII" or "UTF-8" or whatever character set you want when converting bytes into Strings of vice-versa -- unless -- you have previously obtained the default charset and made 100% sure it is the one you need.
Use this method instead:
public byte[] getBytes(String charsetName)
To find the default for your system, just use:
Charset.defaultCharset()
Hope that helps.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With