I want to build a bloom filter in Clojure but I don't have much knowledge of all the hashing libraries that may be available to JVM based languages.
What should I use for the fastest (as opposed to most accurate) bloom map implementation in Clojure?
Take a look at the Bloom Filter implementation in Apache Cassandra. This uses the very fast MurmurHash3 algorithm and combines two hashes (or two portions of the same hash, since upgrading to MurmurHash3 instead of MurmurHash2) in different ways to calculate the desired number of hashes.
The combinatorial generation approach is described in this paper
and here's a snippet from the Cassandra sourcecode:
long[] hash = MurmurHash.hash3_x64_128(b, b.position(), b.remaining(), 0L);
long hash1 = hash[0];
long hash2 = hash[1];
for (int i = 0; i < hashCount; ++i)
{
result[i] = Math.abs((hash1 + (long)i * hash2) % max);
}
See also Bloomfilter and Cassandra = Why used and why hashed several times?
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With