Strings are immutable in Java, which means we cannot change a String character encoding. To achieve what we want, we need to copy the bytes of the String and then create a new one with the desired encoding.
A Java String is internally always encoded in UTF-16 - but you really should think about it like this: an encoding is a way to translate between Strings and bytes.
To decode a string encoded in UTF-8 format, we can use the decode() method specified on strings. This method accepts two arguments, encoding and error .
By far the most popular character encoding today is UTF-8, part of the unicode standard. How quickly can we check whether a sequence of bytes is valid UTF-8? Any ASCII string is a valid UTF-8 string.
How about using
ByteBuffer byteBuffer = StandardCharsets.UTF_8.encode(myString)
String
objects in Java use the UTF-16 encoding that can't be modified.
The only thing that can have a different encoding is a byte[]
. So if you need UTF-8 data, then you need a byte[]
. If you have a String
that contains unexpected data, then the problem is at some earlier place that incorrectly converted some binary data to a String
(i.e. it was using the wrong encoding).
In Java7 you can use:
import static java.nio.charset.StandardCharsets.*;
byte[] ptext = myString.getBytes(ISO_8859_1);
String value = new String(ptext, UTF_8);
This has the advantage over getBytes(String)
that it does not declare throws UnsupportedEncodingException
.
If you're using an older Java version you can declare the charset constants yourself:
import java.nio.charset.Charset;
public class StandardCharsets {
public static final Charset ISO_8859_1 = Charset.forName("ISO-8859-1");
public static final Charset UTF_8 = Charset.forName("UTF-8");
//....
}
Use byte[] ptext = String.getBytes("UTF-8");
instead of getBytes()
. getBytes()
uses so-called "default encoding", which may not be UTF-8.
A Java String is internally always encoded in UTF-16 - but you really should think about it like this: an encoding is a way to translate between Strings and bytes.
So if you have an encoding problem, by the time you have String, it's too late to fix. You need to fix the place where you create that String from a file, DB or network connection.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With