I need (and can't find) a pure python (no c++) implementation of MurmurHash, and I'm too novice to write myself. Speed or memory usage doesn't matter on my project.
I find a attempt here, but it's limited to 31bits hash and I really need a 64bits hash.
Note : for those who need a fast implementation, there is a MurmurHash2 library here and a MurmurHash3 library here
mmh3 is a Python wrapper for MurmurHash (MurmurHash3), a set of fast and robust non-cryptographic hash functions invented by Austin Appleby.
this is untested (sorry!), but here is a version I came up with. Python allows for arbitrarily large integers, so I created a mask for the first 8 bytes (or 64 bits) that I then apply (via bitwise AND) to all arithmetic results that could produce integers larger than 64bits. Maybe somebody else could comment on the general approach and possible issues with endianness, etc.
def bytes_to_long(bytes): assert len(bytes) == 8 return sum((b << (k * 8) for k, b in enumerate(bytes))) def murmur(data, seed): m = 0xc6a4a7935bd1e995 r = 47 MASK = 2 ** 64 - 1 data_as_bytes = bytearray(data) h = seed ^ ((m * len(data_as_bytes)) & MASK) for ll in range(0, len(data_as_bytes), 8): k = bytes_to_long(data_as_bytes[ll:ll + 8]) k = (k * m) & MASK k = k ^ ((k >> r) & MASK) k = (k * m) & MASK h = (h ^ k) h = (h * m) & MASK l = len(data_as_bytes) & 7 if l >= 7: h = (h ^ (data_as_bytes[6] << 48)) if l >= 6: h = (h ^ (data_as_bytes[5] << 40)) if l >= 5: h = (h ^ (data_as_bytes[4] << 32)) if l >= 4: h = (h ^ (data_as_bytes[3] << 24)) if l >= 3: h = (h ^ (data_as_bytes[4] << 16)) if l >= 2: h = (h ^ (data_as_bytes[4] << 8)) if l >= 1: h = (h ^ data_as_bytes[4]) h = (h * m) & MASK h = h ^ ((h >> r) & MASK) h = (h * m) & MASK h = h ^ ((h >> r) & MASK) return h
fix the bugs in Nikolas's version
def bytes_to_long(bytes): assert len(bytes) == 8 return sum((b << (k * 8) for k, b in enumerate(bytes))) def murmur64(data, seed = 19820125): m = 0xc6a4a7935bd1e995 r = 47 MASK = 2 ** 64 - 1 data_as_bytes = bytearray(data) h = seed ^ ((m * len(data_as_bytes)) & MASK) off = len(data_as_bytes)/8*8 for ll in range(0, off, 8): k = bytes_to_long(data_as_bytes[ll:ll + 8]) k = (k * m) & MASK k = k ^ ((k >> r) & MASK) k = (k * m) & MASK h = (h ^ k) h = (h * m) & MASK l = len(data_as_bytes) & 7 if l >= 7: h = (h ^ (data_as_bytes[off+6] << 48)) if l >= 6: h = (h ^ (data_as_bytes[off+5] << 40)) if l >= 5: h = (h ^ (data_as_bytes[off+4] << 32)) if l >= 4: h = (h ^ (data_as_bytes[off+3] << 24)) if l >= 3: h = (h ^ (data_as_bytes[off+2] << 16)) if l >= 2: h = (h ^ (data_as_bytes[off+1] << 8)) if l >= 1: h = (h ^ data_as_bytes[off]) h = (h * m) & MASK h = h ^ ((h >> r) & MASK) h = (h * m) & MASK h = h ^ ((h >> r) & MASK) return h
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With