byte[] byteArray = Charset.forName("UTF-8").encode("hello world").array();
System.out.println(byteArray.length);
Why does the above line of code prints out 12, shouldn't it be printing 11 instead?
After you've written to the ByteBuffer, the number of bytes you've written can be found with the position() method. If you then flip() the buffer, the number of bytes in the buffer can be found with the limit() or remaining() methods.
The toString() method of ByteBuffer class is the inbuilt method used to returns a string representing the data contained by ByteBuffer Object. A new String object is created and initialized to get the character sequence from this ByteBuffer object and then String is returned by toString().
So a string size is 18 + (2 * number of characters) bytes. (In reality, another 2 bytes is sometimes used for packing to ensure 32-bit alignment, but I'll ignore that). 2 bytes is needed for each character, since .
By default, the order of a ByteBuffer object is BIG_ENDIAN. If a byte order is passed as a parameter to the order method, it modifies the byte order of the buffer and returns the buffer itself. The new byte order may be either LITTLE_ENDIAN or BIG_ENDIAN.
The length of the array is the size of the ByteBuffer
's capacity, which is generated from, but not equal to the number of characters you are encoding. Let's take a look at how we allocate memory for a ByteBuffer
...
If you drill into the encode()
method, you'll find that CharsetEncoder#encode(CharBuffer)
looks like this:
public final ByteBuffer encode(CharBuffer in)
throws CharacterCodingException
{
int n = (int)(in.remaining() * averageBytesPerChar());
ByteBuffer out = ByteBuffer.allocate(n);
...
According to my debugger, the averageBytesPerChar
of a UTF_8$Encoder
is 1.1
, and the input String
has 11
characters. 11 * 1.1 = 12.1
, and the code casts the total to an int
when it does the calculation, so the resulting size of the ByteBuffer
is 12.
Because it returns a ByteBuffer
. That's the buffer's capacity (not really even that because of possible slicing), not how many bytes are used. It's a bit like how malloc(10)
is free to return 32 bytes of memory.
System.out.println(Charset.forName("UTF-8").encode("hello world").limit());
That's 11 (as expected).
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With