Bloom filters and its multiple hash functions

Tags:

I'm implementing a simple Bloom Filter as an exercise.

Bloom filters require multiple hash functions, which for practical purposes I don't have.

Assuming I want to have 3 hash functions, isn't it enough to just take the hash of the object I'm checking membership for, hashing it (with murmur3) and then add +1, +2, +3 (for the 3 different hashes) before hashing them again?

As the murmur3 function has a very good avalanche effect (really spreads out results) wouldn't this for all purposes be reasonable?

Pseudo-code:

function generateHashes(obj) {
  long hash = murmur3_hash(obj);
  long hash1 = murmur3_hash(hash+1);
  long hash2 = murmur3_hash(hash+2);
  long hash3 = murmur3_hash(hash+3);
  (hash1, hash2, hash3)
}

If not, what would be a simple, useful approach to this? I'd like to have a solution that would allow me to easily scale for more hash functions if needed be.

Thanks

495

asked Feb 11 '18 00:02

devoured elysium

2 Answers

AFAIK, the usual approach is to not actually use multiple hash functions. Rather, hash once and split the resulting hash into 2, 3, or how many parts you want for your Bloom filter. So for example create a hash of 128 bits and split it into 2 hashes 64 bit each.

https://github.com/Claudenw/BloomFilter/wiki/Bloom-Filters----An-overview

answered Oct 14 '22 18:10

memo

The hashing functions of Bloom filter should be independent and random enough. murmur hash is great for this purpose. So your approach is correct, and you can generate as many new hashes your way. For the educational purposes it is fine.

But in real world, running hashing function multiple times is very time costing, so the usual approach is to create ad-hoc hashes without actually calculating the hash.

To correct @memo, this is done not by splitting the hash into multiple parts, as the width of the hash should remain constant (and you can't split 64 bit hash to more than 64 parts ;) ). The approach is to get a two independent hashes and combine them.

function generateHashes(obj) {
  // initialization phase
  long h1 = murmur3_hash(obj);
  long h2 = murmur3_hash(h1);

  int k = 3; // number of desired hash functions
  long hash[k];

  // generation phase
  for (int i=0; i<k; i++) {
      hash[i] = h1 + (i*h2);

  return hash;
}

As you see, this way creating a new hash is a simple multiply-add operation.

answered Oct 14 '22 17:10

igrinis

Related questions
                            
                                Is there an algorithm to find the nearest number with only small factors?
                            
                                Solving crosswords [closed]
                            
                                Pseudocode: How to decode a PNG file from bits and bytes?
                            
                                How does Erlang's support for *transparent* distribution of actors impact application design?
                            
                                Mnemonic Password Generation Algorithm for QWERTY Keyboards
                            
                                Learning Graph Algorithms
                            
                                Finding the Longest Palindrome Subsequence with less memory
                            
                                Arranging 3 letter words in a 2D matrix such that each row, column and diagonal forms a word
                            
                                Generating random numbers under very specific constraints
                            
                                Random-first search?
                            
                                Find a single integer that occurs with even frequency in a given array of ints when all others occur odd with frequency
                            
                                Backtracking solution for programming exercise (fitting pipes)
                            
                                Find subset with elements that are furthest apart from eachother
                            
                                Technical Interview: Longest Non-Decreasing Subsequence in MxN Matrix [closed]
                            
                                Will a minimum spanning tree and shortest path tree always share at least one edge?
                            
                                Simple random number generator that can generate nth number in series in O(1) time
                            
                                Ukkonen's algorithm for Generalized Suffix Trees
                            
                                Cholesky decomposition of sparse matrices using permutation matrices
                            
                                Fast method to find distance from point to closest edge of polygon
                            
                                Find earliest time for k empty group

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Bloom filters and its multiple hash functions

Tags:

algorithm

hash

bloom-filter

murmurhash

devoured elysium

People also ask

2 Answers

memo

igrinis

Recent Activity

Donate For Us