Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to use result from a hash function to get an array index?

I am in the process of learning about bloom filters and I am looking through various hash functions in JavaScript.

For example I found this one in another Stack Overflow answer:

Found here https://stackoverflow.com/a/7616484/5217568)

String.prototype.hashCode = function() {
  var hash = 0, i, chr, len;
  if (this.length == 0) return hash;
  for (i = 0, len = this.length; i < len; i++) {
    chr   = this.charCodeAt(i);
    hash  = ((hash << 5) - hash) + chr;
    hash |= 0; // Convert to 32bit integer
  }
  return hash;
};

If I run:

String.prototype.call(null, "hello") 

I get the numerical value of : 99162322 (two other hash functions got me: 1335831723, and 120092131).

Now If I create a hypothetical bloom filter with 3 hash functions, and 18 indices (k=3, m=18), How are these large values indexed in an array with indices from 0-17?

like image 310
jmancherje Avatar asked Oct 30 '22 14:10

jmancherje


1 Answers

Use the remainder/modulo operator % to wrap a randomly-generated value within a certain bound.

If you have 18 elements (indices 0 to 17), you could get an index with 99162322 % 18 (16).

If the number of hash value is not a multiple of the number of indices, the result will be biased. For example, if your hash value is one of the five values from 0 to 4, but you were mapping it onto the three indices from 0 to 2, it would be biased towards 0 (0 % 3, 3 % 3) and 1 (1 % 3 or 4 % 3) over 2 (only 2 % 3). Depending on your needs, the bias may be acceptable if the number of hash values is sufficiently larger than the number of indices. If you want to to avoid it you'll need a scheme to generate a new hash input if the hash result is from the bias-inducing range. Something like this:

function hashIndex(string, length, hashValueCount) {
  var minBiasedIndex = hashValueCount - (hashValueCount % length);
  for (var i = 0; ; i++) {
    var hashInput = string + ":" + String(i);
    var hashResult = hash(hashInput);
    if (hashResult < minBiasedIndex) {
      return hashResult % length;
    }
  }
}
like image 172
Jeremy Avatar answered Nov 12 '22 16:11

Jeremy