I encountered the following behavior while using a ByteBuffer. It looks like a bug to me, but perhaps I'm using the libraries incorrectly.
Code:
public static void main(String[] args) {
byte[] byteArray = "hello".getBytes(Charset.forName("UTF-16"));
CharBuffer buffer = ByteBuffer.wrap(byteArray).asCharBuffer();
System.out.println(buffer.length());
for (int i = 0; i < buffer.length(); i++) {
System.out.print(buffer.get(i));
}
}
Output:
6
hello
What's the deal with the leading space? Am I doing something wrong? Is this expected behavior? If so, why?
Seems your system uses UTF-8
as default charset, while you are tring to decode hello
with UTF-16
. You can check it with:
System.out.println(System.getProperty("file.encoding")); // UTF-8 on my machine
Since you decode it with UTF-16
, you should also re-encode it as
CharBuffer
with UTF-16
:
public static void main(String[] args) {
byte[] byteArray = "hello".getBytes(Charset.forName("UTF-16"));
ByteBuffer byteBuffer = ByteBuffer.wrap(byteArray);
Charset utf16 = Charset.forName("UTF-16");
CharBuffer buffer = utf16.decode(byteBuffer);
System.out.println(buffer.length()); // 5
for (int i = 0; i < buffer.length(); i++) {
System.out.print(buffer.get(i)); // hello
}
}
If you are insist on the original code, you can place this code piece
before it to make sure the system will use UTF-16
as default
charset:
System.out.println(System.setProperty("file.encoding", "UTF-16"));
The UTF-16 encoding is specifically documented to produce a Byte Order Mark. If you don't want the BOM, you should specify UTF-16LE:
byte[] byteArray = "hello".getBytes(StandardCharsets.UTF_16LE);
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With