The documentation of Python 3.2's weakref
module's WeakKeyDictionary
and WeakValueDictionary
have a note on iterating over these containers:
Note: Caution: Because a WeakKeyDictionary is built on top of a Python dictionary, it must not change size when iterating over it. This can be difficult to ensure for a WeakKeyDictionary because actions performed by the program during iteration may cause items in the dictionary to vanish “by magic” (as a side effect of garbage collection).
That seems rather dire as a specification of these container's behavior. Especially when running code that uses CPython's garbage collector (when using data structures that contain cycle) or using another Python implementation (e.g. Jython), then it sounds as if there is no safe way of iterating over these collections.
How can I safely iterate over these collections when the garbage collector may clear references at any point in my program? Having a solution for CPython is my priority but I'm interested about the issue on other implementations as well.
Is this maybe a safe way to iterate over a WeakKeyDictionary?
import weakref
d = weakref.WeakKeyDictionary()
...
for k, v in list(d.items()):
...
It is actually safe to iterate over a WeakKeyDictionary
, WeakValueDictionary
, or WeakSet
in Python 2.7 or Python 3.1+. They put in an iteration guard that prevents weakref callbacks from removing references from the underlying dict or set during iteration all the way back in 2010, but the docs never got updated.
With the guard in, if an entry dies before iteration reaches it, iteration will skip that entry, but it won't result in a segfault or a RuntimeError or anything. Dead entries will be added to a list of pending removals and handled later.
Here's the guard (not threadsafe, despite the comment):
class _IterationGuard:
# This context manager registers itself in the current iterators of the
# weak container, such as to delay all removals until the context manager
# exits.
# This technique should be relatively thread-safe (since sets are).
def __init__(self, weakcontainer):
# Don't create cycles
self.weakcontainer = ref(weakcontainer)
def __enter__(self):
w = self.weakcontainer()
if w is not None:
w._iterating.add(self)
return self
def __exit__(self, e, t, b):
w = self.weakcontainer()
if w is not None:
s = w._iterating
s.remove(self)
if not s:
w._commit_removals()
Here's where the WeakKeyDictionary weakref callback checks the guard:
def remove(k, selfref=ref(self)):
self = selfref()
if self is not None:
if self._iterating:
self._pending_removals.append(k)
else:
del self.data[k]
And here's where WeakKeyDictionary.__iter__
sets the guard:
def keys(self):
with _IterationGuard(self):
for wr in self.data:
obj = wr()
if obj is not None:
yield obj
__iter__ = keys
The same guard is used in the other iterators.
If this guard did not exist, calling list(d.items())
wouldn't be safe either. A GC pass could happen inside the items
iterator and remove items from the dict during iteration. (The fact that list
is written in C would provide no protection.)
Back in 2.6 and earlier, the safest way to iterate over a WeakKeyDictionary or WeakValueDictionary would have been to use items
. items
would return a list, and it would use the underlying dict's items
method, which would have been (mostly?) not interruptible by GC. The dict API changes in 3.0 changed how keys
/values
/items
worked, which is probably why the guard was introduced when it was.
To be safe you have to keep a reference somewhere. Using the idiom:
for k,v in list(d.items()):
is not completely safe because, even though it will work most of the time, during the last iteration of the loop the list may be garbage-collected.
The correct way would be:
items = list(d.items())
for k,v in items:
#do stuff that doesn't have a chance of destroying "items"
del items
If you use a WeakKeyDictionary
you could simply store the keys, and store values if you use WeakValueDictionary
.
On a side note: in python2 .items()
already returns a list.
Ultimately it depends on what do you mean by "safe". If you simply mean that the iteration will proceed correctly(iterating once on all the elements), then:
for k,v in list(d.items()):
is safe, because the iteration over the dictionary is actually performed by list(d.items())
, then you are only iterating over the list.
If you, instead, mean that during the iteration elements should not "disappear" from the dictionary as side-effect of the for
-loop, then you must keep a strong reference until the end of the loop, and this requires you to store the list in a variable before starting the loop.
Convert to strong references without using iteration first.
items = []
while d:
try:
items.append(d.popitem())
except KeyError:
pass
If it loses some keys during the while loop, it shouldn't cause problems.
Then you can iterate over items
instead. When you're done, d.update(items)
to put them back, and then del items
.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With