Which hash functions to use in a Bloom filter

Tags:

I've got the following question about choosing hash functions for Bloom filters:

Which functions to use?

In nearly every document/paper you can read that the hash functions used in a Bloom filter should be independent and uniformly distributed.

I know what is meant by this (independent and uniformly distributed), but I'm having trouble to find a argumentation or a discussion, which hash functions fulfill those requirements and are therefore suitable. In a lot of posts I've read about suggestions for the usage of the FNV or Murmur hash function, but not why (or at least without a proof) they are suitable.

Thanks in advance!

717

asked Aug 14 '12 14:08

Torsten

2 Answers

I asked myself the same question when building a Java Bloom filter library. See the Github readme for a detailed treatment of my analysis of hash functions for Bloom filters.

I looked at the problem from two perspectives:

How fast is the computation?
How uniform is the output distribution?

Speed can easily be measured by benchmarks on random input. Uniformity is a bit harder and requires some statistics. Using Chi-Square goodness of fit tests I measured how similar the distribution of hash values is to a uniform distribution.

The result is:

Use Murmur3 for the best trade-off between speed and uniformity. Do not use Murmur2 as it is not uniform for inputs that change in small increments.
Use a cryptographic hash function like SHA-256 for the best uniformity.
Apply the Kirsch-Mitzenmacher-Optimization to only compute 2 instead of k hash functions (hash_i = hash1 + i x hash2).

If your implementation is using Java I would recommend using our Bloom filter hash library. It is well documented and thoroughly tested. For the details, including the benchmark results for different hash function and their unformity according to Chi-Square test, see the Github readme of the repo.

answered Oct 17 '22 06:10

DivineTraube

Hash Functions should provide you with graphical proof of why FNV would be a bad choice, and why Murmur2 or one of Bob Jenkins' Hashes would be a good choice.

answered Oct 17 '22 04:10

Guy Gordon

Related questions
                            
                                exception when trying to call webservice
                            
                                Python implementation of a graph-similarity-grading algorithm
                            
                                Java Factory Pattern With Generics
                            
                                Putty likes to print itself sometimes in my command line
                            
                                Python "setup.py develop": is it possible to create ".egg-info" folder not in source code folder?
                            
                                Setting an httponly cookie with javax.servlet 2.5
                            
                                Sending http post request in Ruby by Net::HTTP
                            
                                Event handlers, closures and garbage collection in Javascript
                            
                                Session management with Firebase?
                            
                                How can I stop cURL from using 100 Continue?
                            
                                Writing to HDFS from Java, getting "could only be replicated to 0 nodes instead of minReplication"
                            
                                Does the finally block execute if the thread running the function is interrupted?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Which hash functions to use in a Bloom filter

Tags:

function

hash

bloom-filter

Torsten

People also ask

2 Answers

DivineTraube

Guy Gordon

Recent Activity

Donate For Us