I keep running into a lack of guidance choosing proper initial capacities for ConcurrentDictionary<TKey, TValue>.
My general use case is those situations where you really want to do something like the following, but cannot:
public static class StaticCache<T>
{
public static readonly Action CompiledExpression = ...;
}
This generic-based approach avoids a dictionary lookup, but can only be used if we always know the required type at compile time. If we only have a Type known at runtime, we can no longer use this approach. The next contender is a ConcurrentDictionary<TKey, TValue>.
The documentation states:
The default capacity (DEFAULT_CAPACITY), which represents the initial number of buckets, is a trade-off between the size of a very small dictionary and the number of resizes when constructing a large dictionary. Also, the capacity should not be divisible by a small prime number. The default capacity is 31.
My number of expected elements tends to be relatively small. Sometimes as small as 3 or 5, sometimes perhaps 15. As such:
Since the default initial capacity is 31, we can potentially reduce our impact on the cache (as well as increase the likelihood for our dictionary to remain in the cache) by using a smaller initial capacity.
This raises the following questions:
What does the capacity actually mean?
What does and does not constitute "a small prime"? Apparently, 31 does not. Does 11? Does 17? Does 23?
If we do happen to want a capacity near a small prime, what capacity can we choose instead? Do we simply choose the nearest non-prime number, or are primes better for capacities and should we really choose a greater prime instead?
In the reference source for ConcurrentDictionary<TKey, TValue> you can see:
Node[] buckets = new Node[capacity];
So, the capacity is the effective size of the hash table. No "fullness" is considered. The only pre-processing of this number is:
if (capacity < concurrencyLevel)
{
capacity = concurrencyLevel;
}
where concurrencyLevel is either defined by you through a constructor parameter or is the default concurrency level defined as PlatformHelper.ProcessorCount.
The capacity is treated differently in Dictionary<TKey,TValue>. Here it is initialized with
private void Initialize(int capacity) {
int size = HashHelpers.GetPrime(capacity);
buckets = new int[size];
...
}
and HashHelpers.GetPrime gets the smallest prime which is greater than or equal to the specified capacity. Primes up to 7199369 are taken from a precalculated array. Lager ones are calculated "the hard way". It is interesting to note that the smallest considered prime is 3.
Unfortunately, HashHelpers is an internal class.
If I understand it right, both implementations resize the hash table based on the number of collisions and not based on a specific fill-factor ("fullness").
If you want to
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With