I was digging around in .NET's implementation of Dictionaries, and found one function that I'm curious about: HashHelpers.GetPrime
.
Most of what it does is quite straightforward, it looks for a prime number above some minimum which is passed to it as a parameter, apparently for the specific purpose of being used as a number of buckets in a hashtable-like structure. But there's one mysterious part:
if (HashHelpers.IsPrime(j) && (j - 1) % 101 != 0)
{
return j;
}
What is the purpose of the (j - 1) % 101 != 0
check? i.e. Why do we apparently want to avoid having a number of buckets which is 1 more than a multiple of 101?
The comments explain it pretty well:
‘InitHash’ is basically an implementation of classic DoubleHashing (see http://en.wikipedia.org/wiki/Double_hashing)
1) The only ‘correctness’ requirement is that the ‘increment’ used to probe a. Be non-zero b. Be relatively prime to the table size ‘hashSize’. (This is needed to insure you probe all entries in the table before you ‘wrap’ and visit entries already probed)
2) Because we choose table sizes to be primes, we just need to insure that the increment is 0 < incr < hashSize
Thus this function would work: Incr = 1 + (seed % (hashSize-1))
While this works well for ‘uniformly distributed’ keys, in practice, non-uniformity is common. In particular in practice we can see ‘mostly sequential’ where you get long clusters of keys that ‘pack’. To avoid bad behavior you want it to be the case that the increment is ‘large’ even for ‘small’ values (because small values tend to happen more in practice). Thus we multiply ‘seed’ by a number that will make these small values bigger (and not hurt large values). We picked HashPrime (101) because it was prime, and if ‘hashSize-1’ is not a multiple of HashPrime (enforced in GetPrime), then incr has the potential of being every value from 1 to hashSize-1. The choice was largely arbitrary.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With