How many hash functions does my bloom filter need?

Tags:

Wikipedia says:

An empty Bloom filter is a bit array of m bits, all set to 0. There must also be k different hash functions defined, each of which maps or hashes some set element to one of the m array positions with a uniform random distribution.

I read the article, but what I don't understand is how k is determined. Is it a function of the table size?

Also, in hash tables I've written I used a simple but effective algorithm for automatically growing the hash's size. Basically, if ever more than 50% of the buckets in the table were filled, I would double the size of the table. I suspect you might still want to do this with a bloom filter to reduce false positives. Correct ?

953

asked Mar 18 '09 14:03

dicroce

2 Answers

Given:

n: how many items you expect to have in your filter (e.g. 216,553)
p: your acceptable false positive rate {0..1} (e.g. 0.01 → 1%)

we want to calculate:

m: the number of bits needed in the bloom filter
k: the number of hash functions we should apply

The formulas:

m = -n*ln(p) / (ln(2)^2) the number of bits
k = m/n * ln(2) the number of hash functions

In our case:

m = -216553*ln(0.01) / (ln(2)^2) = 997263 / 0.48045 = 2,075,686 bits (253 kB)
k = m/n * ln(2) = 2075686/216553 * 0.693147 = 6.46 hash functions (7 hash functions)

Note: Any code released into public domain. No attribution required.

answered Oct 01 '22 16:10

Ian Boyd

If you read further down in the Wikipedia article about Bloom filters, then you find a section Probability of false positives. This section explains how the number of hash functions influences the probabilities of false positives and gives you the formula to determine k from the desired expected prob. of false positives.

Quote from the Wikipedia article:

Obviously, the probability of false positives decreases as m (the number of bits in the array) increases, and increases as n (the number of inserted elements) increases. For a given m and n, the value of k (the number of hash functions) that minimizes the probability is

answered Oct 01 '22 17:10

f3lix

Related questions
                            
                                gcc-4.2 failed with exit code 1 iphone
                            
                                Going from 127.0.0.1 to 2130706433, and back again
                            
                                How to find foreign-key dependencies pointing to one record in Oracle?
                            
                                using python, Remove HTML tags/formatting from a string [duplicate]
                            
                                Jquery : how to trigger an event when the user clear a textbox
                            
                                C# is there a foreach oneliner available?
                            
                                How do I declare a constructor for an 'object' class type in Scala? I.e., a one time operation for the singleton
                            
                                Adding whitespace in Java
                            
                                Cursor size limit in Android SQLiteDatabase
                            
                                Core Data; Cocoa error 134100
                            
                                What language are the C and C++ standard libraries written in?
                            
                                call javascript object method with a variable

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With