Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Murmur3 Hash Compatibility Between Go and Python

We have two different libraries, one in Python and one in Go that need to compute murmur3 hashes identically. Unfortunately no matter how hard we try, we cannot get the libraries to produce the same result. It appears from this SO question about Java and Python that compatibility isn't necessarily straight forward.

Right now we're using the python mmh3 and Go github.com/spaolacci/murmur3 libraries.

In Go:

hash := murmur3.New128()
hash.Write([]byte("chocolate-covered-espresso-beans"))
fmt.Println(base64.RawURLEncoding.EncodeToString(hash.Sum(nil)))
// Output: cLHSo2nCBxyOezviLM5gwg

In Python:

name = "chocolate-covered-espresso-beans"
hash = mmh3.hash128(name.encode('utf-8'), signed=False).to_bytes(16, byteorder='big', signed=False)
print(base64.urlsafe_b64encode(hash).decode('utf-8').strip("="))
# Output: jns74izOYMJwsdKjacIHHA (big byteorder)

hash = mmh3.hash128(name.encode('utf-8'), signed=False).to_bytes(16, byteorder='little', signed=False)
print(base64.urlsafe_b64encode(hash).decode('utf-8').strip("="))
# Output: HAfCaaPSsXDCYM4s4jt7jg (little byteorder)

hash = mmh3.hash_bytes(name.encode('utf-8'))
print(base64.urlsafe_b64encode(hash).decode('utf-8').strip("="))
# Output: HAfCaaPSsXDCYM4s4jt7jg

In Go, murmur3 returns a uint64 so we assume signed=False in Python; however we also tried signed=True and did not get matching hashes.

We're open to different libraries, but are wondering if there is something wrong with either our Go or Python methodologies of computing a base64 encoded hash from a string. Any help appreciated.

like image 716
bbengfort Avatar asked Sep 12 '25 08:09

bbengfort


1 Answers

That first Python result is almost right.

>>> binascii.hexlify(base64.b64decode('jns74izOYMJwsdKjacIHHA=='))
b'8e7b3be22cce60c270b1d2a369c2071c'

In Go:

    x, y := murmur3.Sum128([]byte("chocolate-covered-espresso-beans"))
    fmt.Printf("%x %x\n", x, y)

Results in:

70b1d2a369c2071c 8e7b3be22cce60c2

So the order of the two words is flipped. To get the same result in Python, you can try something like:

name = "chocolate-covered-espresso-beans"
hash = mmh3.hash128(name.encode('utf-8'), signed=False).to_bytes(16, byteorder='big', signed=False)
hash = hash[8:] + hash[:8]
print(base64.urlsafe_b64encode(hash).decode('utf-8').strip("="))
# cLHSo2nCBxyOezviLM5gwg
like image 181
kichik Avatar answered Sep 14 '25 22:09

kichik