Based on the article here, compareTo method on the ByteBuffers might not work correctly when dealing with negative numbers
bytes in Java are signed, contrary to what one typically expects. What is easy to miss
though, is the fact that this affects ByteBuffer.compareTo() as well. The Java API
documentation for that method reads:
"Two byte buffers are compared by comparing their sequences of remaining elements
lexicographically, without regard to the starting position of each sequence within its
corresponding buffer."
A quick reading might lead one to believe the result is what you would typically expect,
but of course given the definition of a byte in Java, this is not the case. The result
is that the order of byte buffers that contains values with the highest order bit set,
will diverge from what you may be expecting.
I tried a couple of examples of putting negative values into the buffer, and comparing with positives, it was always OK. Is the article speaking of the case when we e.g. read in binary data, when an integer -1
was stored as 100000...001
and that would cause issues?
The article is claiming that a ByteBuffer
containing bytes with their high bits set (i.e., bytes with values in the range 0x80
to 0xFF
) will be treated as negative values when compared to the corresponding bytes in another buffer. In other words, each byte is treated as an 8-bit signed integer. Thus you should expect that a byte value of 0x90
will compare less than a byte value of 0x30
.
At least, that's the theory, given the standard behavior of byte
value in Java. In practice, I would expect that bytes would be compared as 8-bit unsigned integers, so that a byte of 0x90
would compare greater than a byte of 0x30
.
It all depends on how the term "lexicographical order" is to be interpreted. If each byte represents an 8-bit character code, for example, then it should logically be treated as an unsigned value (regardless of how Java normally treats byte
objects).
So it boils down to how the comparison of two ByteBuffers
is actually implemented. One way is to treat the bytes as Java signed integers, as:
// Note: Code has be simplified for brevity
int compare(byte[] buf1, byte[] buf2)
{
...
for (int i = 0; i < buf1.length && i < buf2.length; i++)
{
int cmp = buf1[i] - buf2[i]; // Signed arithmetic
if (cmp != 0)
return cmp;
}
return (buf1.length - buf2.length);
}
The other way is to treat the bytes as unsigned integers, which uses slightly different arithmetic for the comparison:
int cmp = (buf1[i] & 0xFF) - (buf2[i] & 0xFF); // Unsigned arithmetic
As I said above, I would expect the second approach to be used by most implementations, without any specific definition of "lexicographical order" given.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With