Why is the first character in the CharBuffer returned by ByteBuffer::asCharBuffer always a space?

Question

I encountered the following behavior while using a ByteBuffer. It looks like a bug to me, but perhaps I'm using the libraries incorrectly.

Code:

public static void main(String[] args) {
    byte[] byteArray = "hello".getBytes(Charset.forName("UTF-16"));
    CharBuffer buffer = ByteBuffer.wrap(byteArray).asCharBuffer();
    System.out.println(buffer.length());
    for (int i = 0; i < buffer.length(); i++) {
        System.out.print(buffer.get(i));
    }
}

Output:

6
 hello

What's the deal with the leading space? Am I doing something wrong? Is this expected behavior? If so, why?

xingbin · Accepted Answer

Seems your system uses UTF-8 as default charset, while you are tring to decode hello with UTF-16. You can check it with:

System.out.println(System.getProperty("file.encoding")); // UTF-8 on my machine

Solution

Since you decode it with UTF-16, you should also re-encode it as CharBuffer with UTF-16:

public static void main(String[] args) {
   byte[] byteArray = "hello".getBytes(Charset.forName("UTF-16"));
   ByteBuffer byteBuffer = ByteBuffer.wrap(byteArray);
   Charset utf16 = Charset.forName("UTF-16");
   CharBuffer buffer = utf16.decode(byteBuffer);
   System.out.println(buffer.length());  // 5
   for (int i = 0; i < buffer.length(); i++) {
       System.out.print(buffer.get(i)); // hello
   }

}

If you are insist on the original code, you can place this code piece before it to make sure the system will use UTF-16 as default charset:
```
System.out.println(System.setProperty("file.encoding", "UTF-16"));
```

DodgyCodeException · Answer

The UTF-16 encoding is specifically documented to produce a Byte Order Mark. If you don't want the BOM, you should specify UTF-16LE:

byte[] byteArray = "hello".getBytes(StandardCharsets.UTF_16LE);

Why is the first character in the CharBuffer returned by ByteBuffer::asCharBuffer always a space?

Tags:

Peter Enns

2 Answers

Solution

xingbin

DodgyCodeException

Recent Activity

Donate For Us

Why is the first character in the CharBuffer returned by ByteBuffer::asCharBuffer always a space?

Tags:

Peter Enns

2 Answers

Solution

xingbin

DodgyCodeException

Related questions

Recent Activity

Donate For Us