Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

.NET ConcurrentDictionary initial capacity set to arbitrary prime number rather than expected capacity in MSDN example documentation. Why?

I was just looking at the MSDN documentation for ConcurrentDictionary, and I saw this in the "example" code:

// We know how many items we want to insert into the ConcurrentDictionary.
// So set the initial capacity to some prime number above that, to ensure that
// the ConcurrentDictionary does not need to be resized while initializing it.
int NUMITEMS = 64;
int initialCapacity = 101;

For reference, the dictionary in the MSDN example is initialized as follows:

ConcurrentDictionary<int, int> cd = new ConcurrentDictionary<int, int>(Environment.ProcessorCount * 2, initialCapacity);
for (int i = 0; i < NUMITEMS; i++) cd[i] = i * i;

In the example, the dictionary is never going to contain more than 64 items. Why not set the initial capacity to 64, rather than to a seemingly arbitrary prime number greater than 64? The comment says that this is to ensure that the dictionary does not need to be resized while initializing it, but why would a similar dictionary with initialCapacity=64 need to be resized? Why was this prime number chosen?

like image 219
Matthew King Avatar asked Nov 15 '10 05:11

Matthew King


1 Answers

Dictionary or hash table relies on hashing the key to get a smaller index to look up into corresponding store (array). So choice of hash function is very important. Typical choice is to get hash code of a key (so that we get good random distribution) and then divide the code by a prime number and use reminder to index into fixed number of buckets. This allows to convert arbitrarily large hash codes into a bounded set of small numbers for which we can define an array to look up into. So its important to have array size in prime number and then the best choice for the size become the prime number that is larger than the required capacity. And that's exactly dictionary implementation does.

So basically any Modulo N (n being prime number) dictionary implementation will need its capacity to be in prime number. So if you say, required capacity is X then these implementation will typically choose next larger primer number than required capacity.

like image 122
VinayC Avatar answered Oct 14 '22 05:10

VinayC