Deleting an item from the while iterating through it will usually result in RuntimeError: dictionary changed size during iteration
exception:
d = {1: 2}
# exception raised
for k in d:
del d[k]
To be more precise, the deletion itself will succeed. However, to enter the next round of iteration, the interpreter has to call next(it)
, where it
is an iterator through the dictionary it obtained earlier. At that point, next()
will notice that the dictionary size changed, and complain.
So far so good. But what if we both delete and add an item to a dictionary:
d = {1: 1}
# no exception raised
for k in d:
# order of next two lines doesn't matter
d[k*10] = k*10
del d[k]
I am almost sure that this is not safe (the docs imply neither insert nor delete is allowed during iteration). Why is the interpreter allowing this code to run without error?
My only guess is that it's too expensive to check which iterators are invalidated whenever an insert or delete method is called. So dict
doesn't attempt to be perfect about raising this exception. Instead, it just keeps track of the size of the dictionary inside each iterator, and checks that it hasn't changed whenever the iterator is actually asked to move to the next item. Is there no approach that would enable full validation at a low cost?
One approach to ensure that an exception is raised whenever an attempt is made to insert or delete a key while in a loop, is to maintain the number of modifications made to the dictionary. Then the iterators can check that that number didn't change in their __next__
method (instead of verifying that dictionary size didn't change).
This code would do that. Using SafeDict
or its keys()
/ items()
/ values()
proxy, the loops become safe from an accidental insertion/deletion:
class SafeKeyIter:
def __init__(self, iterator, container):
self.iterator = iterator
self.container = container
try:
self.n_modifications = container.n_modifications
except AttributeError:
raise RuntimeError('container does not support safe iteration')
def __next__(self):
if self.n_modifications != self.container.n_modifications:
raise RuntimeError('container modified duration iteration')
return next(self.iterator)
def __iter__(self):
return self
class SafeView:
def __init__(self, view, container):
self.view = view
self.container = container
def __iter__(self):
return SafeKeyIter(self.view.__iter__(), self.container)
class SafeDict(dict):
def __init__(self, *args, **kwargs):
self.n_modifications = 0
super().__init__(*args, **kwargs)
def __setitem__(self, key, value):
if key not in self:
self.n_modifications += 1
super().__setitem__(key, value)
def __delitem__(self, key):
self.n_modifications += 1
super().__delitem__(key)
def __iter__(self):
return SafeKeyIter(super().__iter__(), self)
def keys(self):
return SafeView(super().keys(), self)
def values(self):
return SafeView(super().values(), self)
def items(self):
return SafeView(super().items(), self)
# this now raises RuntimeError:
d = SafeDict({1: 2})
for k in d:
d[k * 100] = 100
del d[k]
This doesn't seem too expensive, so I'm not sure why it's not implemented in CPython dict
. Perhaps the extra cost of updating n_modifications
on a dictionary was judged too high.
The simplest answer is because you delete 1 item and add 1 item so the fact that the size has changed actually never gets caught; the RuntimeError
is raised when there is a difference between the size of the iterator and the dictionary for that iterator:
if (di->di_used != d->ma_used) {
PyErr_SetString(PyExc_RuntimeError,
"dictionary changed size during iteration");
di->di_used = -1; /* Make this state sticky */
return NULL;
}
when you add one and remove one, di->di_used
stays the same to d->ma_used
(which gets incremented by one and decremented by one). The operations (del
and key add) are performed on the dict
object d
and, due to the balance of these operations, no mismatch is found in the previous if
clause I added.
But, if you add two keys, for example, you'll get the same error:
d = {1: 1}
for k in d:
del d[k]
d[1] = 1
d[2] = 2
RuntimeErrorTraceback (most recent call last)
<ipython-input-113-462571d7e0df> in <module>()
1 d = {1: 1}
2 # no exception raised
----> 3 for k in d:
4 # order of next two lines doesn't matter
5 del d[k]
RuntimeError: dictionary changed size during iteration
because realizing that size has changed is caught here. If, of course, you decrement twice, the same behavior as before occurs, it balances out.
As I iterated in the comment section, a check evaluating if insertions or deletions has occurred in a balanced way is not as trivial as checking if the size has simply changed. It also wouldn't make sense to me on two other accounts:
dict
s being ubiquitous).Overall I doubt adding this check would benefit; it is pretty well established to most that iterating over a collection while changing it is not the best idea.
Like grown-ups we should realize that Python shouldn't check everything for us and, instead, just don't do things when they have known unwanted effects.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With