Go: Mechanics of binary encoding

Tags:

go

I'm trying to make a new binary encoding package because the standard Go encoding/binary package does not do exactly what I want.

What I do not understand is why encoding/binary is using x >>= 7 in binary.PutUvarint instead of x >>= 8. If I understand correctly this is deliberately shifting the bits by 7 instead of by 8, which results in an overall size of 80 bits to store a uint64 instead of the 64 bits that is clearly the number of bits required. Why? What is the reason for this? This must be something to do with the fact that it is generating variable length slices of bytes, but why would >>7 help this?

Here is the binary encoding function for your reference:

func PutUvarint(buf []byte, x uint64) int {
    i := 0
    for x >= 0x80 {
        buf[i] = byte(x) | 0x80
        x >>= 7
        i++
    }
    buf[i] = byte(x)
    return i + 1
}

466

asked Aug 19 '14 07:08

Alasdair

1 Answers

encoding/binary/varint.go

package binary

// This file implements "varint" encoding of 64-bit integers.
// The encoding is:
// - unsigned integers are serialized 7 bits at a time, starting with the
//   least significant bits
// - the most significant bit (msb) in each output byte indicates if there
//   is a continuation byte (msb = 1)
// - signed integers are mapped to unsigned integers using "zig-zag"
//   encoding: Positive values x are written as 2*x + 0, negative values
//   are written as 2*(^x) + 1; that is, negative numbers are complemented
//   and whether to complement is encoded in bit 0.
//
// Design note:
// At most 10 bytes are needed for 64-bit values. The encoding could
// be more dense: a full 64-bit value needs an extra byte just to hold bit 63.
// Instead, the msb of the previous byte could be used to hold bit 63 since we
// know there can't be more than 64 bits. This is a trivial improvement and
// would reduce the maximum encoding length to 9 bytes. However, it breaks the
// invariant that the msb is always the "continuation bit" and thus makes the
// format incompatible with a varint encoding for larger numbers (say 128-bit).

A classic technique for lossless data compression is Huffman coding where the more common symbols are generally represented using fewer bits than less common symbols. In practice, smaller numbers occur most often. Therefore, if we can represent small numbers efficiently, even if larger numbers are represented less efficiently, we will get lossless data compression.

Uvarints are a method of serializing unsigned integers using one or more bytes. Smaller numbers take a smaller number of bytes. Each byte in a uvarint, except the last byte, has the most significant bit (msb) set – this indicates that there are further bytes to come. The lower 7 bits of each byte are used to store the number in groups of 7 bits, least significant group first. We can do this for unsigned integers with any number of bits.

For example,

uint bits : uvarint bytes
7 7f : 1 7f 
14 3fff : 2 ff7f 
21 1fffff : 3 ffff7f 
28 fffffff : 4 ffffff7f 
35 7ffffffff : 5 ffffffff7f 
42 3ffffffffff : 6 ffffffffff7f 
49 1ffffffffffff : 7 ffffffffffff7f 
56 ffffffffffffff : 8 ffffffffffffff7f 
63 7fffffffffffffff : 9 ffffffffffffffff7f 
64 ffffffffffffffff : 10 ffffffffffffffffff01

And so on, for as many uint bits as we want.

If we have a lots of numbers that are represented as 1 to 49 bits of an unsigned 64-bit integer serialized to uvarints of 1 to 7 bytes, we won't care if we have a few numbers that are represented as 57 to 64 bits of an unsigned 64-bit integer serialized to uvarints of 9 to 10 bytes.

References:

Huffman coding

Variable-length quantity

128

answered Sep 30 '22 19:09

peterSO

Related questions
                            
                                What is the default value of a map of struct
                            
                                Why is unsafe.Sizeof considered unsafe?
                            
                                How to BulkWrite\UpdateMany with Go driver for MongoDB
                            
                                API Gateway, blocked by CORS policy: No 'Access-Control-Allow-Origin' header
                            
                                Cannot get connection between containers
                            
                                Performance: Sorting Slice vs Sorting Type (of Slice) with Sort implementation
                            
                                How do you convert a time offset to a location/timezone in Go
                            
                                kubernetes client-go: convert labelselector to label string
                            
                                Why error messages shouldn't end with a punctuation mark in Go?
                            
                                libxml2 vs expat for an XMPP server
                            
                                http.Handle(Handler or HandlerFunc)
                            
                                Go's gofmt and diff/VCS issues?
                            
                                How to properly call JSON-RPC in Go?
                            
                                golang interface compliance compile type check
                            
                                init of slice in struct
                            
                                Read until end of channel in Go
                            
                                Go Programming: Generating Combinations
                            
                                What does asterisk (*struct) notation mean in golang
                            
                                How can I create a pair type in go?
                            
                                How do I handle errors in a worker pool using WaitGroup?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With