Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Given an integer, how big is its varint encoding?

I have a python list of integers, and I'd like to know how much space it will take up when encoded as a sequence of Protocol Buffers variable-length integers, or varints. What's the best way to figure this out without actually encoding the integers?

my_numbers = [20, 69, 500, 38987982344444, 420, 99, 1, 999]
e = MyCoolVarintArrayEncoder(my_numbers)
print(len(e))  # ???
like image 836
Emmett Butler Avatar asked Jul 30 '18 23:07

Emmett Butler


People also ask

What is Varint encoding?

Variable-width integers, or varints, are at the core of the wire format. They allow encoding unsigned 64-bit integers using anywhere between one and ten bytes, with small values using fewer bytes. Each byte in the varint has a continuation bit that indicates if the byte that follows it is part of the varint.

How does Protobuf encode?

Protobuf encodes floats into little endian byte order (least significant byte comes first). As we saw earlier, float encoding representation is a bit different. Fortunately, python provides a way to make it easy. You can read more about struct from here.

What is variable length integer?

A variable length integer is an encoding of 64-bit unsigned integers into between 1 and 9 bytes. The encoding has the following properties: Smaller (and more common) values use fewer bytes and take up less space than larger (and less common) values.


1 Answers

Each integer is encoded in base 128, one byte per "digit". The length of an integer value's representation in any base is ceil(log(value, base)).

Take the log(base=128) of each integer; round those values up to the nearest integer; sum those rounded values, and there's your length.

like image 112
Prune Avatar answered Nov 03 '22 12:11

Prune