I'm trying to convert a string with special characters like É into a string with UTF-8 encoding. I tried doing this:
String str = "MARIE-HÉLÈNE";
byte sByte[] = str.getBytes("UTF-8");
str = new String(sByte,"UTF-8");
The problem is, when I do "É".getBytes("UTF-8"), I get 63 which is interpreted as '?' when it's being converted to a new string. How can I fix this issue?
EDIT: I also noticed that this issue was not reproducible on Eclipse, probably because the text file encoding is usually set to UTF-8.
I tried doing byte[] str = "MARIE-HÉLÈNE".getBytes("UTF-8") in http://www.javarepl.com/console.html and got the result byte[] str = [77, 65, 82, 73, 69, 45, 72, 63, 76, 63, 78, 69]
This kind of error happens when information about the encoding of the source file is not given to the compiler (javac) properly. If the encoding of your source file is UTF-8, compile the file like the following.
javac -encoding UTF-8 E.java
The following is another example for the case where the encoding of the source file is UTF-16 Big Endian.
javac -encoding UTF-16BE E.java
I've already confirmed that the program below properly shows "0xC3 0x89". So, there is no problem in your code.
public class E
{
public static void main(String[] args) throws Exception
{
byte[] bytes = "É".getBytes("UTF-8");
for (int i = 0; i < bytes.length; ++i)
{
System.out.format("0x%02X ", (byte)(bytes[i]));
}
System.out.println();
}
}
"É".getBytes("UTF-8") returns a byte[] of 2 bytes: c3 89.
"MARIE-HÉLÈNE" is 4d 41 52 49 45 2d 48 c3 89 4c c3 88 4e 45.
4d 41 52 49 45 2d 48 c3 89 4c c3 88 4e 45
M A R I E - H É L È N E
Converting the bytes back using new String(bytes,"UTF-8") will restore the original string.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With