Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

What is the purpose of this line in HashHelpers.GetPrime?

I was digging around in .NET's implementation of Dictionaries, and found one function that I'm curious about: HashHelpers.GetPrime.

Most of what it does is quite straightforward, it looks for a prime number above some minimum which is passed to it as a parameter, apparently for the specific purpose of being used as a number of buckets in a hashtable-like structure. But there's one mysterious part:

if (HashHelpers.IsPrime(j) && (j - 1) % 101 != 0)
{
    return j;
}

What is the purpose of the (j - 1) % 101 != 0 check? i.e. Why do we apparently want to avoid having a number of buckets which is 1 more than a multiple of 101?

like image 232
Ben Aaronson Avatar asked Aug 14 '14 15:08

Ben Aaronson


1 Answers

The comments explain it pretty well:

‘InitHash’ is basically an implementation of classic DoubleHashing (see http://en.wikipedia.org/wiki/Double_hashing)

1) The only ‘correctness’ requirement is that the ‘increment’ used to probe a. Be non-zero b. Be relatively prime to the table size ‘hashSize’. (This is needed to insure you probe all entries in the table before you ‘wrap’ and visit entries already probed)

2) Because we choose table sizes to be primes, we just need to insure that the increment is 0 < incr < hashSize

Thus this function would work: Incr = 1 + (seed % (hashSize-1))

While this works well for ‘uniformly distributed’ keys, in practice, non-uniformity is common. In particular in practice we can see ‘mostly sequential’ where you get long clusters of keys that ‘pack’. To avoid bad behavior you want it to be the case that the increment is ‘large’ even for ‘small’ values (because small values tend to happen more in practice). Thus we multiply ‘seed’ by a number that will make these small values bigger (and not hurt large values). We picked HashPrime (101) because it was prime, and if ‘hashSize-1’ is not a multiple of HashPrime (enforced in GetPrime), then incr has the potential of being every value from 1 to hashSize-1. The choice was largely arbitrary.

like image 138
Yuval Itzchakov Avatar answered Nov 13 '22 04:11

Yuval Itzchakov