Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Why modifying dict during iteration doesn't always raise exception?

Deleting an item from the while iterating through it will usually result in RuntimeError: dictionary changed size during iteration exception:

d = {1: 2}
# exception raised
for k in d:
  del d[k]

To be more precise, the deletion itself will succeed. However, to enter the next round of iteration, the interpreter has to call next(it), where it is an iterator through the dictionary it obtained earlier. At that point, next() will notice that the dictionary size changed, and complain.

So far so good. But what if we both delete and add an item to a dictionary:

d = {1: 1}
# no exception raised
for k in d:
  # order of next two lines doesn't matter
  d[k*10] = k*10
  del d[k]

I am almost sure that this is not safe (the docs imply neither insert nor delete is allowed during iteration). Why is the interpreter allowing this code to run without error?

My only guess is that it's too expensive to check which iterators are invalidated whenever an insert or delete method is called. So dict doesn't attempt to be perfect about raising this exception. Instead, it just keeps track of the size of the dictionary inside each iterator, and checks that it hasn't changed whenever the iterator is actually asked to move to the next item. Is there no approach that would enable full validation at a low cost?

like image 640
max Avatar asked Dec 04 '16 05:12

max


2 Answers

One approach to ensure that an exception is raised whenever an attempt is made to insert or delete a key while in a loop, is to maintain the number of modifications made to the dictionary. Then the iterators can check that that number didn't change in their __next__ method (instead of verifying that dictionary size didn't change).

This code would do that. Using SafeDict or its keys() / items() / values() proxy, the loops become safe from an accidental insertion/deletion:

class SafeKeyIter:
    def __init__(self, iterator, container):
        self.iterator = iterator
        self.container = container
        try:
            self.n_modifications = container.n_modifications
        except AttributeError:
            raise RuntimeError('container does not support safe iteration')

    def __next__(self):
        if self.n_modifications != self.container.n_modifications:
            raise RuntimeError('container modified duration iteration')
        return next(self.iterator)

    def __iter__(self):
        return self


class SafeView:
    def __init__(self, view, container):
        self.view = view
        self.container = container

    def __iter__(self):
        return SafeKeyIter(self.view.__iter__(), self.container)

class SafeDict(dict):
    def __init__(self, *args, **kwargs):
        self.n_modifications = 0
        super().__init__(*args, **kwargs)

    def __setitem__(self, key, value):
        if key not in self:
            self.n_modifications += 1
        super().__setitem__(key, value)

    def __delitem__(self, key):
        self.n_modifications += 1
        super().__delitem__(key)

    def __iter__(self):
        return SafeKeyIter(super().__iter__(), self)

    def keys(self):
        return SafeView(super().keys(), self)

    def values(self):
        return SafeView(super().values(), self)

    def items(self):
        return SafeView(super().items(), self)

# this now raises RuntimeError:
d = SafeDict({1: 2})
for k in d:
    d[k * 100] = 100
    del d[k]

This doesn't seem too expensive, so I'm not sure why it's not implemented in CPython dict. Perhaps the extra cost of updating n_modifications on a dictionary was judged too high.

like image 133
max Avatar answered Oct 20 '22 23:10

max


The simplest answer is because you delete 1 item and add 1 item so the fact that the size has changed actually never gets caught; the RuntimeError is raised when there is a difference between the size of the iterator and the dictionary for that iterator:

if (di->di_used != d->ma_used) {
    PyErr_SetString(PyExc_RuntimeError,
                    "dictionary changed size during iteration");
    di->di_used = -1; /* Make this state sticky */
    return NULL;
} 

when you add one and remove one, di->di_used stays the same to d->ma_used (which gets incremented by one and decremented by one). The operations (del and key add) are performed on the dict object d and, due to the balance of these operations, no mismatch is found in the previous if clause I added.

But, if you add two keys, for example, you'll get the same error:

d = {1: 1}
for k in d:
  del d[k]
  d[1] = 1
  d[2] = 2

RuntimeErrorTraceback (most recent call last)
<ipython-input-113-462571d7e0df> in <module>()
      1 d = {1: 1}
      2 # no exception raised
----> 3 for k in d:
      4   # order of next two lines doesn't matter
      5   del d[k]

RuntimeError: dictionary changed size during iteration

because realizing that size has changed is caught here. If, of course, you decrement twice, the same behavior as before occurs, it balances out.

As I iterated in the comment section, a check evaluating if insertions or deletions has occurred in a balanced way is not as trivial as checking if the size has simply changed. It also wouldn't make sense to me on two other accounts:

  • If people do indeed choose to alter a dictionary during iteration, odds are they won't be doing it in a balanced way so the check in place should suffice for the most common cases.
  • If you do decide to add more checks you'd be affecting the performance of pretty much every thing in Python (due to dicts being ubiquitous).

Overall I doubt adding this check would benefit; it is pretty well established to most that iterating over a collection while changing it is not the best idea.

Like grown-ups we should realize that Python shouldn't check everything for us and, instead, just don't do things when they have known unwanted effects.

like image 31
Dimitris Fasarakis Hilliard Avatar answered Oct 20 '22 22:10

Dimitris Fasarakis Hilliard