Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Safely iterating over WeakKeyDictionary and WeakValueDictionary

The documentation of Python 3.2's weakref module's WeakKeyDictionary and WeakValueDictionary have a note on iterating over these containers:

Note: Caution: Because a WeakKeyDictionary is built on top of a Python dictionary, it must not change size when iterating over it. This can be difficult to ensure for a WeakKeyDictionary because actions performed by the program during iteration may cause items in the dictionary to vanish “by magic” (as a side effect of garbage collection).

That seems rather dire as a specification of these container's behavior. Especially when running code that uses CPython's garbage collector (when using data structures that contain cycle) or using another Python implementation (e.g. Jython), then it sounds as if there is no safe way of iterating over these collections.

How can I safely iterate over these collections when the garbage collector may clear references at any point in my program? Having a solution for CPython is my priority but I'm interested about the issue on other implementations as well.

Is this maybe a safe way to iterate over a WeakKeyDictionary?

import weakref

d = weakref.WeakKeyDictionary()

...

for k, v in list(d.items()):
    ...
like image 297
Feuermurmel Avatar asked Sep 14 '12 15:09

Feuermurmel


3 Answers

It is actually safe to iterate over a WeakKeyDictionary, WeakValueDictionary, or WeakSet in Python 2.7 or Python 3.1+. They put in an iteration guard that prevents weakref callbacks from removing references from the underlying dict or set during iteration all the way back in 2010, but the docs never got updated.

With the guard in, if an entry dies before iteration reaches it, iteration will skip that entry, but it won't result in a segfault or a RuntimeError or anything. Dead entries will be added to a list of pending removals and handled later.

Here's the guard (not threadsafe, despite the comment):

class _IterationGuard:
    # This context manager registers itself in the current iterators of the
    # weak container, such as to delay all removals until the context manager
    # exits.
    # This technique should be relatively thread-safe (since sets are).

    def __init__(self, weakcontainer):
        # Don't create cycles
        self.weakcontainer = ref(weakcontainer)

    def __enter__(self):
        w = self.weakcontainer()
        if w is not None:
            w._iterating.add(self)
        return self

    def __exit__(self, e, t, b):
        w = self.weakcontainer()
        if w is not None:
            s = w._iterating
            s.remove(self)
            if not s:
                w._commit_removals()

Here's where the WeakKeyDictionary weakref callback checks the guard:

def remove(k, selfref=ref(self)):
    self = selfref()
    if self is not None:
        if self._iterating:
            self._pending_removals.append(k)
        else:
            del self.data[k]

And here's where WeakKeyDictionary.__iter__ sets the guard:

def keys(self):
    with _IterationGuard(self):
        for wr in self.data:
            obj = wr()
            if obj is not None:
                yield obj

__iter__ = keys

The same guard is used in the other iterators.


If this guard did not exist, calling list(d.items()) wouldn't be safe either. A GC pass could happen inside the items iterator and remove items from the dict during iteration. (The fact that list is written in C would provide no protection.)


Back in 2.6 and earlier, the safest way to iterate over a WeakKeyDictionary or WeakValueDictionary would have been to use items. items would return a list, and it would use the underlying dict's items method, which would have been (mostly?) not interruptible by GC. The dict API changes in 3.0 changed how keys/values/items worked, which is probably why the guard was introduced when it was.

like image 70
user2357112 supports Monica Avatar answered Oct 22 '22 14:10

user2357112 supports Monica


To be safe you have to keep a reference somewhere. Using the idiom:

for k,v in list(d.items()):

is not completely safe because, even though it will work most of the time, during the last iteration of the loop the list may be garbage-collected.

The correct way would be:

items = list(d.items())
for k,v in items:
    #do stuff that doesn't have a chance of destroying "items"
del items

If you use a WeakKeyDictionary you could simply store the keys, and store values if you use WeakValueDictionary.

On a side note: in python2 .items() already returns a list.

Ultimately it depends on what do you mean by "safe". If you simply mean that the iteration will proceed correctly(iterating once on all the elements), then:

for k,v in list(d.items()):

is safe, because the iteration over the dictionary is actually performed by list(d.items()), then you are only iterating over the list.

If you, instead, mean that during the iteration elements should not "disappear" from the dictionary as side-effect of the for-loop, then you must keep a strong reference until the end of the loop, and this requires you to store the list in a variable before starting the loop.

like image 37
Bakuriu Avatar answered Oct 22 '22 14:10

Bakuriu


Convert to strong references without using iteration first.

items = []
while d:
    try:
        items.append(d.popitem())
    except KeyError:
        pass

If it loses some keys during the while loop, it shouldn't cause problems.

Then you can iterate over items instead. When you're done, d.update(items) to put them back, and then del items.

like image 2
gilch Avatar answered Oct 22 '22 14:10

gilch