from java.lang.StringCoding : <pre class="prettyprint"><code>String csn = (charsetName == null) ? "ISO-8859-1" : charsetName; </code></pre> This is what is used from Java.lang.getBytes() , in linux jdk 7 I was always under the impression that UTF-8 is the default charset ? Thanks

<h3>It is a bit complicated ...</h3> Java tries to use the default character encoding to return bytes using String.getBytes(). <ul> <li>The default charset is provided by the system file.encoding property.</li> <li>This is cached and there is no use in changing it via the System.setProperty(..) after the JVM starts.</li> <li>If the file.encoding property does not map to a known charset, then the UTF-8 is specified.</li> </ul> .... Here is the tricky part (which is probably never going to come into play) .... If the system cannot decode or encode strings using the default charset (UTF-8 or another one), then there will be a fallback to ISO-8859-1. If the fallback does not work ... the system will fail! .... Really ... (gasp!) ... Could it crash if my specified charset cannot be used, and UTF-8 or ISO-8859-1 are also unusable? Yes. The Java source comments state in the StringCoding.encode(...) method: <blockquote> // If we can not find ISO-8859-1 (a required encoding) then things are seriously wrong with the installation. </blockquote> ... and then it calls System.exit(1) <hr> <h3>So, why is there an intentional fallback to ISO-8859-1 in the getBytes() method?</h3> It is possible, although not probable, that the users JVM may not support decoding and encoding in UTF-8 or the charset specified on JVM startup. Then, is the default charset used properly in the String class during getBytes()? No. However, the better question is ... <hr> <h3>Does String.getBytes() deliver what it promises?</h3> The contract as defined in the Javadoc is correct. <blockquote> The behavior of this method when this string cannot be encoded in the default charset is unspecified. The <code>CharsetEncoder</code> class should be used when more control over the encoding process is required. </blockquote> <hr> <h3>The good news (and better way of doing things)</h3> It is always advised to explicitly specify "ISO-8859-1" or "US-ASCII" or "UTF-8" or whatever character set you want when converting bytes into Strings of vice-versa -- unless -- you have previously obtained the default charset and made 100% sure it is the one you need. Use this method instead: <pre class="prettyprint"><code>public byte[] getBytes(String charsetName) </code></pre> To find the default for your system, just use: <pre class="prettyprint"><code>Charset.defaultCharset() </code></pre> Hope that helps.

Why does Java's String.getBytes() uses "ISO-8859-1"

Tags:

java

character-encoding

utf-8

iso-8859-1

from java.lang.StringCoding :

String csn = (charsetName == null) ? "ISO-8859-1" : charsetName;

This is what is used from Java.lang.getBytes() , in linux jdk 7 I was always under the impression that UTF-8 is the default charset ?

Thanks

332

asked Sep 30 '12 07:09

Amnon

1 Answers

It is a bit complicated ...

Java tries to use the default character encoding to return bytes using String.getBytes().

The default charset is provided by the system file.encoding property.
This is cached and there is no use in changing it via the System.setProperty(..) after the JVM starts.
If the file.encoding property does not map to a known charset, then the UTF-8 is specified.

.... Here is the tricky part (which is probably never going to come into play) ....

If the system cannot decode or encode strings using the default charset (UTF-8 or another one), then there will be a fallback to ISO-8859-1. If the fallback does not work ... the system will fail!

.... Really ... (gasp!) ... Could it crash if my specified charset cannot be used, and UTF-8 or ISO-8859-1 are also unusable?

Yes. The Java source comments state in the StringCoding.encode(...) method:

// If we can not find ISO-8859-1 (a required encoding) then things are seriously wrong with the installation.

... and then it calls System.exit(1)

So, why is there an intentional fallback to ISO-8859-1 in the getBytes() method?

It is possible, although not probable, that the users JVM may not support decoding and encoding in UTF-8 or the charset specified on JVM startup.

Then, is the default charset used properly in the String class during getBytes()?

No. However, the better question is ...

Does String.getBytes() deliver what it promises?

The contract as defined in the Javadoc is correct.

The behavior of this method when this string cannot be encoded in the default charset is unspecified. The CharsetEncoder class should be used when more control over the encoding process is required.

The good news (and better way of doing things)

It is always advised to explicitly specify "ISO-8859-1" or "US-ASCII" or "UTF-8" or whatever character set you want when converting bytes into Strings of vice-versa -- unless -- you have previously obtained the default charset and made 100% sure it is the one you need.

Use this method instead:

public byte[] getBytes(String charsetName)

To find the default for your system, just use:

Charset.defaultCharset()

Hope that helps.

157

answered Oct 01 '22 05:10

The Coordinator

Related questions
                            
                                How to use .jar files in NetBeans?
                            
                                Add SoapHeader to org.springframework.ws.WebServiceMessage
                            
                                SimpleDateFormat ignoring month when parsing
                            
                                Check if word contains substring in Java Regex
                            
                                Generating HmacSHA256 signature in JUnit
                            
                                Custom package names cxf-codegen-plugin
                            
                                Why is Eclipse asking to declare strictfp inside enum
                            
                                FasterXML jackson-dataformat-xml serialization version and encoding not added to xml
                            
                                Maven add jars through systemPath/system but not added to war or anywhere else
                            
                                comparing float/double values using == operator
                            
                                Size of file which we get through AssetManager function getAssets in Android
                            
                                Mockito + Spy: How to gather return values
                            
                                Where can I find the package javax.media.opengl?
                            
                                Does every exception have an required try-catch?
                            
                                How to use Explicit Map with Java 8 and ModelMapper?
                            
                                Declare dependency in <dependencyManagement> section even if dependency not used everywhere?
                            
                                is it possible to overload a final method
                            
                                Why the main program in Java is put into a class?
                            
                                What the meaning of various types of white space in Java?
                            
                                Can I add an action listener to a JLabel?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With