I have a script that does a lot of dict deletions and eventually iterates over it.
I've managed to reduce it to a simple benchmark:
> py -m timeit -s "a = {i:i for i in range(10000000)};[a.pop(i) for i in range(10000000-1)]" "next(iter(a))"
10 loops, best of 5: 30.8 msec per loop
How come iterating over a single key after I've deleted all previous values becomes slow?
So upon traversing when you encounter these dummy null values it keeps on iterating till it finds the next real-valued key. Since there may be lots of empty spaces we'll be traversing on without any real benefits and hence Dictionaries are generally slower than their array/list counterparts.
A little benchmark shows me that iterating a list is definately faster.
Dictionaries in Python will find keys in O(1) on average. But complexity is not the only factor in execution time. For example accessing a list item by its index is also O(1) but it is considerably faster than accessing a dictionary by its key.
A dictionary is 6.6 times faster than a list when we lookup in 100 items.
Since 3.6, Python dictionaries work with an internal hash table and an array of entries.
When a key is removed from the dictionary, its entry is actually replaced in the array with a dummy value marking the entry as deleted.
Upon iteration, it skips all of these dummy values one by one, until it finds the next real item.
That's why if you'll skip the first value, and remove only the rest, you'll see the iteration is as fast as iterating over a single item dictionary:
> py -m timeit -s "a = {i:i for i in range(10000000)};[a.pop(i) for i in range(1,10000000-1)]" "next(iter(a))"
1000000 loops, best of 5: 219 nsec per loop
For more information about the internal dictionary structure, you may see this wonderful answer.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With