I am in the process of learning about bloom filters and I am looking through various hash functions in JavaScript.
For example I found this one in another Stack Overflow answer:
Found here https://stackoverflow.com/a/7616484/5217568)
String.prototype.hashCode = function() {
var hash = 0, i, chr, len;
if (this.length == 0) return hash;
for (i = 0, len = this.length; i < len; i++) {
chr = this.charCodeAt(i);
hash = ((hash << 5) - hash) + chr;
hash |= 0; // Convert to 32bit integer
}
return hash;
};
If I run:
String.prototype.call(null, "hello")
I get the numerical value of : 99162322 (two other hash functions got me: 1335831723, and 120092131).
Now If I create a hypothetical bloom filter with 3 hash functions, and 18 indices (k=3, m=18), How are these large values indexed in an array with indices from 0-17?
Use the remainder/modulo operator %
to wrap a randomly-generated value within a certain bound.
If you have 18 elements (indices 0 to 17), you could get an index with 99162322 % 18
(16
).
If the number of hash value is not a multiple of the number of indices, the result will be biased. For example, if your hash value is one of the five values from 0 to 4, but you were mapping it onto the three indices from 0 to 2, it would be biased towards 0 (0 % 3
, 3 % 3
) and 1 (1 % 3
or 4 % 3
) over 2 (only 2 % 3
). Depending on your needs, the bias may be acceptable if the number of hash values is sufficiently larger than the number of indices. If you want to to avoid it you'll need a scheme to generate a new hash input if the hash result is from the bias-inducing range. Something like this:
function hashIndex(string, length, hashValueCount) {
var minBiasedIndex = hashValueCount - (hashValueCount % length);
for (var i = 0; ; i++) {
var hashInput = string + ":" + String(i);
var hashResult = hash(hashInput);
if (hashResult < minBiasedIndex) {
return hashResult % length;
}
}
}
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With