I've been trying to get a high level understanding of what MurmurHash does.
I've read a basic description but have yet to find a good explanation of when to use it and why. I know its very fast but want to know a bit more.
I asked a related question about how I could fit a UUID into a Redis bitset, and someone suggested using MurmurHash. It works but I'd like to understand the risks/benefits.
MurmurHash is a non-cryptographic hash function suitable for general hash-based lookup. It was created by Austin Appleby in 2008 and is currently hosted on GitHub along with its test suite named 'SMHasher'. It also exists in a number of variants, all of which have been released into the public domain.
Murmur or Murmerhash is a modern non-cryptographic hash function with a low collision rate and high performance. It is thus suitable for general hash-based lookups and unsuitable for cryptographic uses. The name is composed of multiply (MU) and rotate (R), used in its inner loop.
mmh3 is a Python wrapper for MurmurHash (MurmurHash3), a set of fast and robust non-cryptographic hash functions invented by Austin Appleby.
A cryptographic hash function is an algorithm that takes an arbitrary amount of data input—a credential—and produces a fixed-size output of enciphered text called a hash value, or just “hash.” That enciphered text can then be stored instead of the password itself, and later used to verify the user.
Murmur is a family of good general purpose hashing functions, suitable for non-cryptographic usage. As stated by Austin Appleby, MurmurHash provides the following benefits:
You can certainly use it to hash UUIDs (like any other advanced hashing functions: CityHash, Jenkins, Paul Hsieh's, etc ...). Now, a Redis bitset is limited to 4 GB bits (512 MB). So you need to reduce 128 bits of data (UUID) to 32 bits (hashed value). Whatever the quality of the hashing function, there will be collisions.
Using an engineered hash function like Murmur will maximize the quality of the distribution, and minimize the number of collisions, but it offers no other guarantee.
Here are some links comparing the quality of general purpose hash functions:
http://www.azillionmonkeys.com/qed/hash.html
http://www.strchr.com/hash_functions
http://blog.aggregateknowledge.com/2011/12/05/choosing-a-good-hash-function-part-1/
http://blog.aggregateknowledge.com/2011/12/29/choosing-a-good-hash-function-part-2/
http://blog.aggregateknowledge.com/2012/02/02/choosing-a-good-hash-function-part-3/
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With