Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Is encoding/gob deterministic?

Tags:

go

gob

Can we expect for two Go objects x, y such that x is equal to y (assuming no trickiness with interfaces and maps, just structs and arrays) that the output of gob_encode(x) and gob_encode(y) will always be the same?

edit (Jun 8 2018):

gob encoding is non-deterministic when maps are involved. This is due to the random iteration order of the maps, resulting in their serialisation to be randomly ordered.

like image 969
dpington Avatar asked Oct 20 '15 05:10

dpington


3 Answers

You shouldn't really care as long as it "gets the job done". But current encoding/gob implementation is deterministic. But (continue reading)!

Since:

A stream of gobs is self-describing. Each data item in the stream is preceded by a specification of its type, expressed in terms of a small set of predefined types.

This means if you encode a value of a type for the first time, type information will be sent. If you encode another value of the same type, the type description will not be transmitted again, just a reference to its previous spec. So even if you encode the same value twice, it will produce different byte sequences as the first will contain type spec and the value, the second will contain only a type ref (e.g. type id) and the value.

See this example:

type Int struct{ X int }

b := &bytes.Buffer{}
e := gob.NewEncoder(b)

e.Encode(Int{1})
fmt.Println(b.Bytes())

e.Encode(Int{1})
fmt.Println(b.Bytes())

e.Encode(Int{1})
fmt.Println(b.Bytes())

Output (try it on the Go Playground):

[23 255 129 3 1 1 3 73 110 116 1 255 130 0 1 1 1 1 88 1 4 0 0 0 5 255 130 1 2 0]
[23 255 129 3 1 1 3 73 110 116 1 255 130 0 1 1 1 1 88 1 4 0 0 0 5 255 130 1 2 0 5 255 130 1 2 0]
[23 255 129 3 1 1 3 73 110 116 1 255 130 0 1 1 1 1 88 1 4 0 0 0 5 255 130 1 2 0 5 255 130 1 2 0 5 255 130 1 2 0]

As seen the first Encode() generates lots of bytes plus the value for our Int value being [5 255 130 1 2 0], the second and third calls add the same [5 255 130 1 2 0] sequence.

But if you create 2 different gob.Encoders and you write the same values in the same order, they will produce exact results.

Note that in the previous statement "same order" is also important. Because type specification is transmitted when first value of such type is sent, sending values of different types in different order will transmit type specs in different order too, and so the references/identifiers of the types may differ, which implies that when a value of such type is encoded, different type reference/id will be used/sent.

Also note that the implementation of the gob package may change from release to release. These changes will be backward compatible (they must explicitly state if for some reason they would make backward incompatible changes), but being backward compatible does not mean the output is the same. So different Go versions may produce different results (but all is decodeable with all compatible versions).

like image 122
icza Avatar answered Nov 07 '22 17:11

icza


It should probably be noted that the accepted answer is not correct: encoding/gob doesn't order map elements in a deterministic way: https://play.golang.org/p/Hh3_5Kb3Znn

I've forked encoding/gob and added some code to order maps by key before writing them to the stream. This will affect performance, but my particular application doesn't need high performance. Remember custom marshalers can break this, so use with care: https://github.com/dave/stablegob

like image 20
David Brophy Avatar answered Nov 07 '22 18:11

David Brophy


It also isn't deterministic if you use different types and different encoders.

Example:

package main

import (
    "bytes"
    "crypto/sha1"
    "encoding/gob"
    "encoding/hex"
    "log"
)

func main() {
    encint()
    encint64()
    encstring()

}

func encint() {
    s1 := []int{0, 2, 4, 5, 7}
    buf2 := bytes.Buffer{}
    enc2 := gob.NewEncoder(&buf2)
    enc2.Encode(s1)
}

func encint64() {
    s1 := []int64{0, 2, 4, 5, 7}
    buf2 := bytes.Buffer{}
    enc2 := gob.NewEncoder(&buf2)
    enc2.Encode(s1)
}

func encstring() {
    s1 := []string{"a", "b", "c", "d"}
    buf2 := bytes.Buffer{}
    enc2 := gob.NewEncoder(&buf2)
    enc2.Encode(s1)
    log.Println(buf2.Bytes())

    hash := sha1.New()
    hash.Write(buf2.Bytes())
    ret := hash.Sum(nil)
    log.Println(hex.EncodeToString(ret))
}

Run in Go Playground

Notice if you comment out encint() or encint64() the encstring will produce different bytes and a different hashcode.

This happens despite using different objects/pointers.

like image 1
hbt Avatar answered Nov 07 '22 17:11

hbt