Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

ByteBuffer - compareTo method might diverge

Based on the article here, compareTo method on the ByteBuffers might not work correctly when dealing with negative numbers

bytes in Java are signed, contrary to what one typically expects. What is easy to miss
though, is the fact that this affects ByteBuffer.compareTo() as well. The Java API
documentation for that method reads:

"Two byte buffers are compared by comparing their sequences of remaining elements 
lexicographically, without regard to the starting position of each sequence within its
corresponding buffer."

A quick reading might lead one to believe the result is what you would typically expect, 
but of course given the definition of a byte in Java, this is not the case. The result 
is that the order of byte buffers that contains values with the highest order bit set,   
will diverge from what you may be expecting.

I tried a couple of examples of putting negative values into the buffer, and comparing with positives, it was always OK. Is the article speaking of the case when we e.g. read in binary data, when an integer -1 was stored as 100000...001 and that would cause issues?

like image 824
Bober02 Avatar asked Nov 01 '22 03:11

Bober02


1 Answers

The article is claiming that a ByteBuffer containing bytes with their high bits set (i.e., bytes with values in the range 0x80 to 0xFF) will be treated as negative values when compared to the corresponding bytes in another buffer. In other words, each byte is treated as an 8-bit signed integer. Thus you should expect that a byte value of 0x90 will compare less than a byte value of 0x30.

At least, that's the theory, given the standard behavior of byte value in Java. In practice, I would expect that bytes would be compared as 8-bit unsigned integers, so that a byte of 0x90 would compare greater than a byte of 0x30.

It all depends on how the term "lexicographical order" is to be interpreted. If each byte represents an 8-bit character code, for example, then it should logically be treated as an unsigned value (regardless of how Java normally treats byte objects).

So it boils down to how the comparison of two ByteBuffers is actually implemented. One way is to treat the bytes as Java signed integers, as:

// Note: Code has be simplified for brevity
int compare(byte[] buf1, byte[] buf2)
{
    ...
    for (int i = 0;  i < buf1.length  &&  i < buf2.length;  i++)
    {
        int cmp = buf1[i] - buf2[i];        // Signed arithmetic
        if (cmp != 0)
            return cmp;
    }
    return (buf1.length - buf2.length);
}

The other way is to treat the bytes as unsigned integers, which uses slightly different arithmetic for the comparison:

        int cmp = (buf1[i] & 0xFF) - (buf2[i] & 0xFF);  // Unsigned arithmetic

As I said above, I would expect the second approach to be used by most implementations, without any specific definition of "lexicographical order" given.

like image 54
David R Tribble Avatar answered Nov 09 '22 04:11

David R Tribble