I'm reading avro format specification and trying to understand its implementation. Here is the method for decoding long value:
@Override
public long readLong() throws IOException {
ensureBounds(10);
int b = buf[pos++] & 0xff;
int n = b & 0x7f;
long l;
if (b > 0x7f) {
b = buf[pos++] & 0xff;
n ^= (b & 0x7f) << 7;
if (b > 0x7f) {
b = buf[pos++] & 0xff;
n ^= (b & 0x7f) << 14;
if (b > 0x7f) {
b = buf[pos++] & 0xff;
n ^= (b & 0x7f) << 21;
if (b > 0x7f) {
// only the low 28 bits can be set, so this won't carry
// the sign bit to the long
l = innerLongDecode((long)n);
} else {
l = n;
}
} else {
l = n;
}
} else {
l = n;
}
} else {
l = n;
}
if (pos > limit) {
throw new EOFException();
}
return (l >>> 1) ^ -(l & 1); // back to two's-complement
}
The question is why do we always check if 0x7f
less then the byte we just read?
This is a form of bit-packing where the most significant bit of each byte
is used to determine if another byte
should be read. Essentially, this allows you to encode values in a fewer amount of bytes than they would normally require. However, there is the caveat that, if the number is large, then more than the normal amount of bytes will be required. Therefore, this is successful when working with small values.
Getting to your question, 0x7F
is 0111_1111
in binary. You can see that the most significant bit is used as the flag bit.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With