If I have, say, 100 items that'll be stored in a dictionary, should I initialise it thus?
var myDictionary = new Dictionary<Key, Value>(100);
My understanding is that the .NET dictionary internally resizes itself when it reaches a given loading, and that the loading threshold is defined as a ratio of the capacity.
That would suggest that if 100 items were added to the above dictionary, then it would resize itself when one of the items was added. Resizing a dictionary is something I'd like to avoid as it has a performance hit and is wasteful of memory.
The probability of hashing collisions is proportional to the loading in a dictionary. Therefore, even if the dictionary does not resize itself (and uses all of its slots) then the performance must degrade due to these collisions.
How should one best decide what capacity to initialise the dictionary to, assuming you know how many items will be inside the dictionary?
Net 4.5 the initial capacity for a Dictionary is 3. Lists do have a default capacity of 0, but the capacity goes to 4 after adding the first item to the list.
The maximum capacity of a dictionary is up to 2 billion elements on a 64-bit system by setting the enabled attribute of the gcAllowVeryLargeObjects configuration element to true in the run-time environment.
In C#, Dictionary is a generic collection which is generally used to store key/value pairs. The working of Dictionary is quite similar to the non-generic hashtable. The advantage of Dictionary is, it is generic type. Dictionary is defined under System. Collection.
No, they are not thread safe (without performing your own locking). Use one of the Concurrent collections instead.
What you should initialize the dictionary capacity to depends on two factors: (1) The distribution of the gethashcode function, and (2) How many items you have to insert.
Your hash function should either be randomly distributed, or it is should be specially formulated for your set of input. Let's assume the first, but if you are interested in the second look up perfect hash functions.
If you have 100 items to insert into the dictionary, a randomly distributed hash function, and you set the capacity to 100, then when you insert the ith item into the hash table you have a (i-1) / 100 probability that the ith item will collide with another item upon insertion. If you want to lower this probability of collision, increase the capacity. Doubling the expected capacity halves the chance of collision.
Furthermore, if you know how frequently you are going to be accessing each item in the dictionary you may want to insert the items in order of decreasing frequency since the items that you insert first will be on average faster to access.
0.001ms
is 1 microsecond.Stopwatch
as it's system-dependent, so don't stress over differences at the microsecond level.capacity
in the Dictionary<String,String>
constructor..NET: | .NET Framework 4.8 | .NET 5 |
---|---|---|
With initial capacity of 1,000,000 | ||
Constructor | 1.170ms | 0.003ms |
Fill in loop | 353.420ms | 181.846ms |
Total time | 354.590ms | 181.880ms |
Without initial capacity | ||
Constructor | 0.001ms | 0.001ms |
Fill in loop | 400.158ms | 228.687ms |
Total time | 400.159ms | 228.688ms |
Speedup from setting initial capacity | ||
Time | 45.569ms | 46.8ms |
Speedup % | 11% | 20% |
10
, 100
, 1000
, 10000
, and 100000
) and the 10-20% speedup was also observed at those sizes, but in absolute terms a 20% speedup on an operation that takes a fraction of a millisecondString
instances (caused by i.ToString()
.String
) was used for both keys and values, which uses the same size as a native pointer size (8 bytes on x64), so results will be different when re-run if the keys and/or values use a larger value-type (such as a ValueTuple
). There are other type-size factors to consider as well.// Warmup:
{
var foo1 = new Dictionary<string, string>();
var foo2 = new Dictionary<string, string>( capacity: 10_000 );
foo1.Add( "foo", "bar" );
foo2.Add( "foo", "bar" );
}
Stopwatch sw = Stopwatch.StartNew();
// Pre-set capacity:
TimeSpan pp_initTime;
TimeSpan pp_populateTime;
{
var dict1 = new Dictionary<string, string>(1000000);
pp_initTime = sw.GetElapsedAndRestart();
for (int i = 0; i < 1000000; i++)
{
dict1.Add(i.ToString(), i.ToString());
}
}
pp_populateTime = sw.GetElapsedAndRestart();
//
TimeSpan empty_initTime;
TimeSpan empty_populateTime;
{
var dict2 = new Dictionary<string, string>();
empty_initTime = sw.GetElapsedAndRestart();
for (int i = 0; i < 1000000; i++)
{
dict2.Add(i.ToString(), i.ToString());
}
}
empty_populateTime = sw.GetElapsedAndRestart();
//
Console.WriteLine("Pre-set capacity. Init time: {0:N3}ms, Fill time: {1:N3}ms, Total time: {2:N3}ms.", pp_initTime.TotalMilliseconds, pp_populateTime.TotalMilliseconds, ( pp_initTime + pp_populateTime ).TotalMilliseconds );
Console.WriteLine("Empty capacity. Init time: {0:N3}ms, Fill time: {1:N3}ms, Total time: {2:N3}ms.", empty_initTime.TotalMilliseconds, empty_populateTime.TotalMilliseconds, ( empty_initTime + empty_populateTime ).TotalMilliseconds );
// Extension methods:
[MethodImpl( MethodImplOptions.AggressiveInlining | MethodImplOptions.AggressiveOptimization )]
public static TimeSpan GetElapsedAndRestart( this Stopwatch stopwatch )
{
TimeSpan elapsed = stopwatch.Elapsed;
stopwatch.Restart();
return elapsed;
}
Original benchmark, without cold-startup warmup phase and lower-precision DateTime
timing:
dict1
) total time is 1220.778ms
(for construction and population).dict2
) total time is 1502.490ms
(for construction and population).static void Main(string[] args)
{
const int ONE_MILLION = 1000000;
DateTime start1 = DateTime.Now;
{
var dict1 = new Dictionary<string, string>( capacity: ONE_MILLION );
for (int i = 0; i < ONE_MILLION; i++)
{
dict1.Add(i.ToString(), i.ToString());
}
}
DateTime stop1 = DateTime.Now;
DateTime start2 = DateTime.Now;
{
var dict2 = new Dictionary<string, string>();
for (int i = 0; i < ONE_MILLION; i++)
{
dict2.Add(i.ToString(), i.ToString());
}
}
DateTime stop2 = DateTime.Now;
Console.WriteLine("Time with size initialized: " + (stop1.Subtract(start1)) + "\nTime without size initialized: " + (stop2.Subtract(start2)));
Console.ReadLine();
}
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With