Whats the problem with this code? I made an ISO8859 String. So most of the ÄÖÜ are some krypooutput. Thats fine. But how to Convert them back to normal chars (UTF8 or something)?
String s = new String("Üü?öäABC".getBytes(), "ISO-8859-15");
System.out.println(s);
//ÃÃŒ?öÀABC => ok(?)
System.out.println(new String(s.getBytes(), "ISO-8859-15"));
//ÃÂÃÅ?öÃâ¬ABC => ok(?)
System.out.println(new String(s.getBytes(), "UTF-8"));
//ÃÃŒ?öÀABC => huh?
A construct such as new String("Üü?öäABC".getBytes(), "ISO-8859-15");
is almost always an error.
What you're doing here is taking a String
object, getting the corresponding byte[]
in the platform default encoding and re-interpreting it as ISO-8859-15 to convert it back to a String
.
If the platform default encoding happens to be ISO-8859-15 (or near enough to make no difference for this particular String
, for example ISO-8859-1), then it is a no-op (i.e. it has no real effect).
In all other cases it will most likely destroy the String
.
If you try to "fix" a String
, then you're probably too late: if you have to use a specific encoding to read data, then you should use it at the point where binary data is converted to String
data. For example if you read from an InputStream
, you need to pass the correct encoding to the constructor of the InputStreamReader
.
Trying to fix the problem "after the fact" will be
byte[]
with the wrong encoding can be a destructive operation).I hope this will solve your problem.
String readable = "äöüÄÖÜßáéíóúÁÉÍÓÚàèìòùÀÈÌÒÙñÑ";
try {
String unreadable = new String(readable.getBytes("UTF-8"), "ISO-8859-15");
// unreadable -> äöüÃÃÃÃáéÃóúÃÃÃÃÃà èìòùÃÃÃÃÃñÃ
} catch (UnsupportedEncodingException e) {
// handle error
}
And:
String unreadable = "äöüÃÃÃÃáéÃóúÃÃÃÃÃà èìòùÃÃÃÃÃñÃ";
try {
String readable = new String(unreadable.getBytes("ISO-8859-15"), "UTF-8");
// readable -> äöüÄÖÜßáéíóúÁÉÍÓÚàèìòùÀÈÌÒÙñÑ
} catch (UnsupportedEncodingException e) {
// ...
}
String s = new String("Üü?öäABC".getBytes(), "ISO-8859-15"); //bug
All this code does is corrupt data. It transcodes UTF-16 data to the system encoding (whatever that is) and the takes those bytes, pretends they're valid ISO-8859-15 and transcodes them to UTF-16.
Then how to convert an input String like "ÃÃŒ?öÀABC" to normal? (if I know that the string is from an ISO8859 file).
The correct way to perform this operation would be like this:
byte[] iso859_15 = { (byte) 0xc3, (byte) 0xc3, (byte) 0xbc, 0x3f,
(byte) 0xc3, (byte) 0xb6, (byte) 0xc3, (byte) 0xa4, 0x41, 0x42,
0x43 };
String utf16 = new String(iso859_15, Charset.forName("ISO-8859-15"));
Strings in Java are always UTF-16. All other encodings must be represented using the byte
type.
Now, if you use System.out
to output the resultant string, that might not appear correctly, but that is a different transcoding issue. For example, the Windows console default encoding doesn't match the system encoding. The encoding used by System.out
must match the encoding of the device receiving the data. You should also take care to ensure that you are reading your source files with the same encoding your editor is using.
To understand how treatment of character data varies between languages, read this.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With