Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Why .Net dictionaries resize to prime numbers?

According to this question a .Net dictionary resizes its allocated space to prime numbers that are at least twice the current size. Why is it important to use prime numbers and not just twice the current size? (I tried to use my google-fu powers to find an answer, but to no avail)

like image 405
maayank Avatar asked Jan 09 '11 09:01

maayank


People also ask

Why are prime numbers important in coding?

Prime numbers are often used in cryptography, and as a method for generating some kinds of random numbers. For example, in RSA encryption, two large, arbitrary prime numbers are multiplied to generate a semiprime, from which a public encryption key is generated.

Is prime for large numbers?

A prime number is a positive integer, excluding 1, with no divisors other than 1 and itself. According to Euclid's theorem there are infinitely many prime numbers, so there is no largest prime.

How Dictionary works internally in java?

Dictionary actually has two arrays with the same number of items: Buckets array – stores the index of which the object is stored in the entries array. Entries array – stores the actual items within a special data structure (if it's a reference type – the reference to the item will be stored)<o:p>


1 Answers

The bucket in which an element is put is determined by (hash & 0x7FFFFFF) % capacity. This needs to be uniformly distributed. From this it follows that if multiple entries which are a multiple of a certain base (hash1 = x1 * base, hash2 = x2 * base,...) where base and capacity aren't coprime (greatest common divisor > 1) some slots are over used, and some are never used. Since prime numbers are coprime to any number except themselves, they have relatively good chances of achieving a good distribution.

One particularly nice property of this is that for capacity > 30 the contribution of each bit to the hashcode is different. So if the variation of the hash is concentrated in only a few bits it will still lead to a good distribution. This explains why capacities which are powers of two are bad: they mask out the high bits. A set of numbers where only the high bits are different isn't that unlikely.

Personally I think they choose that function badly. It contains an expensive modulo operation and if the entries are multiples of the prime-capacity its performance breaks down. But it seems to be good enough for most applications.

like image 127
CodesInChaos Avatar answered Nov 09 '22 14:11

CodesInChaos