Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Read a variable length integer from the supplied InputStream

I was looking a bit above the Netflix Hollow library code (https://github.com/Netflix/hollow) and I found this function to which I can't find any sense (Warning: I don't know much about Java ). Theoretically, the function returns a variable-length integer from the InputStream.

/**
 * Read a variable length integer from the supplied InputStream
 */
public static int readVInt(InputStream in) throws IOException {
    byte b = (byte)in.read();

    if(b == (byte) 0x80)
        throw new RuntimeException("Attempting to read null value as int");

    int value = b & 0x7F;
    while ((b & 0x80) != 0) {
      b = (byte)in.read();
      value <<= 7;
      value |= (b & 0x7F);
    }

    return value;
}

I comment my doubts:

1) Int value = b & 0x7F : The result of this is always b, right? What's the point?

2) while ((b & 0x80) != 0) : The result of b & 0x80 (If b is an integer digit, that is, coded in decimal from 0 to 9, would be 48-57 in decimal) is always 0. Therefore, will never enter the loop...

like image 977
Toni Miguel López Avatar asked Mar 27 '26 23:03

Toni Miguel López


1 Answers

Honestly, it's kind of weird (IMHO) to find such low level implementations in Java; this kind of logic is more usual in C programs and other "close to the metal" languages. But I guess efficient streaming to millions of users doesn't come for free :)

b is a byte, not an integer. It's interpreted as 8 bits.
0x80 is 10000000, a mask with all bits set to zero except the higher-order ("first") one.
0x7F is 01111111, a mask with all bits set to one except the first one. The inverse of 0x80.
value is the integer value we want to read. This is an int which is 4 bytes in Java. So 00000000000000000000000000000000 initially.

The code reads a sequence of bytes, one by one. It uses the first bit as a mark for termination (0 means "last byte"), and concatenates the other 7 bits to value. The masks are applied to evaluate just those bits. So b & 0x80 is used to check if the first bit is set, while b & 0x7F is used to set the first bit to zero and keep the value of all the other bits.

Example:

  • We want to transmit the number 24612135 byte per byte. This is 1011101111000110100100111 in binary.
  • We split it in groups of 7 bits: 0001011 1011110 0011010 0100111 and make every group a 1-leading 8-bit byte, save for the last one which will be marked as 0-leading: 10001011 11011110 10011010 00100111
  • First byte gets read. So b = 10001011. Since this is not 0x80 (10000000), we know this is going to be a valid, non-null integer.
  • We discard the first bit (b & 0x7f is 00001011) and set value to that. Now value is 00000000000000000000000000001011
  • Now this could be a one-byte number, so we have to check for the first bit to know if we should continue reading. b & 0x80 is not 0, so we enter the loop to read more bytes.
  • Read the second byte. Now b = 11011110. Discard first bit (b & 0x7f) and put the other 7 bits inside value. But value already has some bits set, so we have to shift them to make room for 7 more bits: value <<= 7. Now value is 00000000000000000000010110000000. By using the bitwise OR operator value |= (b & 0x7F) we set the lowest 7 bits to the right value. Now value is 00000000000000000000010111011110.
  • The second byte didn't have the first bit set to 0 either, so we loop again for the third byte. Same logic; now value is 00000000000000101110111100011010.
  • The third byte also didn't have the first bit set to 0; loop again for the fourth byte, 00100111. Now value is 00000001011101111000110100100111. This is the correct value that we intended to send.
  • Because the fourth byte did have the first bit set to zero, the loop's break condition now evaluates to true, so we stop reading bytes.

Anyways, I hope this helped you.

like image 122
walen Avatar answered Mar 31 '26 06:03

walen



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!