Why is Dictionary.First() so slow?

Question

Not a real question because I already found out the answer, but still interesting thing.

I always thought that hash table is the fastest associative container if you hash properly.

However, the following code is terribly slow. It executes only about 1 million iterations and takes more than 2 minutes of time on a Core 2 CPU.

The code does the following: it maintains the collection todo of items it needs to process. At each iteration it takes an item from this collection (doesn't matter which item), deletes it, processes it if it wasn't processed (possibly adding more items to process), and repeats this until there are no items to process.

The culprit seems to be the Dictionary.Keys.First() operation.

The question is why is it slow?

Stopwatch watch = new Stopwatch();
watch.Start();

HashSet<int> processed = new HashSet<int>();
Dictionary<int, int> todo = new Dictionary<int, int>();

todo.Add(1, 1);
int iterations = 0;

int limit = 500000;
while (todo.Count > 0)
{
    iterations++;
    var key = todo.Keys.First();
    var value = todo[key];
    todo.Remove(key);
    if (!processed.Contains(key))
    {
        processed.Add(key);
        // process item here
        if (key < limit) { todo[key + 13] = value + 1; todo[key + 7] = value + 1; }
        // doesn't matter much how
    }
}
Console.WriteLine("Iterations: {0}; Time: {1}.", iterations, watch.Elapsed);

This results in:

Iterations: 923007; Time: 00:02:09.8414388.

Simply changing Dictionary to SortedDictionary yields:

Iterations: 499976; Time: 00:00:00.4451514.

300 times faster while having only 2 times less iterations.

The same happens in java. Used HashMap instead of Dictionary and keySet().iterator().next() instead of Keys.First().

SLaks · Accepted Answer

Dictionary<TKey, TValue> maintains a hash table.

Its enumerator will loop through the buckets in the hash table until it finds a non-empty bucket, then return the value in that bucket.
Once the dictionary grows large, this operation becomes expensive.
In addition, removing an item from the dictionary doesn't shrink the buckets array, so the First() call gets slower as you remove items. (Because it has to loop further to find a non-empty bucket)

Therefore, repeatedly calling First() and removing is O(n²).

By the way, you can avoid the value lookup like this: (This will not make it noticeably faster)

var kvp = todo.First();

//Use kvp.Key and kcp.Value

Why is Dictionary.First() so slow?

Tags:

performance

.net

algorithm

hashtable

Rotsor

1 Answers

SLaks

Recent Activity

Donate For Us

Why is Dictionary.First() so slow?

Tags:

performance

.net

algorithm

hashtable

Rotsor

1 Answers

SLaks

Related questions

Recent Activity

Donate For Us