Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

What hashing techniques to use when building a bloom filter in clojure?

I want to build a bloom filter in Clojure but I don't have much knowledge of all the hashing libraries that may be available to JVM based languages.

What should I use for the fastest (as opposed to most accurate) bloom map implementation in Clojure?

like image 767
jdoig Avatar asked Mar 04 '12 10:03

jdoig


1 Answers

Take a look at the Bloom Filter implementation in Apache Cassandra. This uses the very fast MurmurHash3 algorithm and combines two hashes (or two portions of the same hash, since upgrading to MurmurHash3 instead of MurmurHash2) in different ways to calculate the desired number of hashes.

The combinatorial generation approach is described in this paper

and here's a snippet from the Cassandra sourcecode:

    long[] hash = MurmurHash.hash3_x64_128(b, b.position(), b.remaining(), 0L);
    long hash1 = hash[0];
    long hash2 = hash[1];
    for (int i = 0; i < hashCount; ++i)
    {
        result[i] = Math.abs((hash1 + (long)i * hash2) % max);
    }

See also Bloomfilter and Cassandra = Why used and why hashed several times?

like image 94
DNA Avatar answered Nov 13 '22 11:11

DNA