I have a python list of integers, and I'd like to know how much space it will take up when encoded as a sequence of Protocol Buffers variable-length integers, or varints. What's the best way to figure this out without actually encoding the integers?
my_numbers = [20, 69, 500, 38987982344444, 420, 99, 1, 999]
e = MyCoolVarintArrayEncoder(my_numbers)
print(len(e)) # ???
Variable-width integers, or varints, are at the core of the wire format. They allow encoding unsigned 64-bit integers using anywhere between one and ten bytes, with small values using fewer bytes. Each byte in the varint has a continuation bit that indicates if the byte that follows it is part of the varint.
Protobuf encodes floats into little endian byte order (least significant byte comes first). As we saw earlier, float encoding representation is a bit different. Fortunately, python provides a way to make it easy. You can read more about struct from here.
A variable length integer is an encoding of 64-bit unsigned integers into between 1 and 9 bytes. The encoding has the following properties: Smaller (and more common) values use fewer bytes and take up less space than larger (and less common) values.
Each integer is encoded in base 128, one byte per "digit". The length of an integer value's representation in any base is ceil(log(value, base)).
Take the log(base=128) of each integer; round those values up to the nearest integer; sum those rounded values, and there's your length.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With