Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Why is the first character in the CharBuffer returned by ByteBuffer::asCharBuffer always a space?

Tags:

I encountered the following behavior while using a ByteBuffer. It looks like a bug to me, but perhaps I'm using the libraries incorrectly.

Code:

public static void main(String[] args) {
    byte[] byteArray = "hello".getBytes(Charset.forName("UTF-16"));
    CharBuffer buffer = ByteBuffer.wrap(byteArray).asCharBuffer();
    System.out.println(buffer.length());
    for (int i = 0; i < buffer.length(); i++) {
        System.out.print(buffer.get(i));
    }
}

Output:

6
 hello

What's the deal with the leading space? Am I doing something wrong? Is this expected behavior? If so, why?

like image 805
Peter Enns Avatar asked Apr 25 '18 15:04

Peter Enns


2 Answers

Seems your system uses UTF-8 as default charset, while you are tring to decode hello with UTF-16. You can check it with:

System.out.println(System.getProperty("file.encoding")); // UTF-8 on my machine

Solution

  • Since you decode it with UTF-16, you should also re-encode it as CharBuffer with UTF-16:

    public static void main(String[] args) {
       byte[] byteArray = "hello".getBytes(Charset.forName("UTF-16"));
       ByteBuffer byteBuffer = ByteBuffer.wrap(byteArray);
       Charset utf16 = Charset.forName("UTF-16");
       CharBuffer buffer = utf16.decode(byteBuffer);
       System.out.println(buffer.length());  // 5
       for (int i = 0; i < buffer.length(); i++) {
           System.out.print(buffer.get(i)); // hello
       }
    

    }

  • If you are insist on the original code, you can place this code piece before it to make sure the system will use UTF-16 as default charset:

    System.out.println(System.setProperty("file.encoding", "UTF-16"));
    
like image 125
xingbin Avatar answered Oct 11 '22 04:10

xingbin


The UTF-16 encoding is specifically documented to produce a Byte Order Mark. If you don't want the BOM, you should specify UTF-16LE:

byte[] byteArray = "hello".getBytes(StandardCharsets.UTF_16LE);
like image 34
DodgyCodeException Avatar answered Oct 11 '22 04:10

DodgyCodeException