Is the usage of list(d.items())
in the example below safe?
import threading
n = 2000
d = {}
def dict_to_list():
while True:
list(d.items()) # is this safe to do?
def modify():
for i in range(n):
d[i] = i
if __name__ == "__main__":
t1 = threading.Thread(target=dict_to_list, daemon=True)
t1.start()
t2 = threading.Thread(target=modify, daemon=True)
t2.start()
t2.join()
The background behind this question is that an iterator over a dictionary item view checks on every step whether the dictionary size changed, as the following example illustrates.
d = {}
view = d.items() # this is an iterable
it = iter(view) # this is an iterator
d[1] = 1
print(list(view)) # this is ok, it prints [(1, 1)]
print(list(it)) # this raises a RuntimeError because the size of the dictionary changed
So if the call to list(...)
in the first example above can be interrupted (i.e., the thread t1
could release the GIL), the first example might cause RuntimeErrors to occur in thread t1
. There are sources that claim the operation is not atomic, see here. However, I haven't been able to get the first example to crash.
I understand that the safe thing to do here would be to use some locks instead of trying to rely on the atomicity of certain operations. However, I'm debugging an issue in a third party library that uses similar code and that I cannot necessarily change directly.
Many common operations on a dict are atomic, meaning that they are thread-safe. Atomic means that the operation either occurs or does not occur with no in between inconsistent state. Operations such as adding, removing, and reading a value on a dict are atomic.
Python's built-in structures are thread-safe for single operations, but it can sometimes be hard to see where a statement really becomes multiple operations. Your code should be safe.
update() is thread-safe. In particular, for your example with integers keys, yes.
Python Dictionary items() Method The items() method returns a view object. The view object contains the key-value pairs of the dictionary, as tuples in a list. The view object will reflect any changes done to the dictionary, see example below.
I suspect the author of that article was confused about dict views, thinking dict.items
returns an iterator like dict.iteritems
did in Python 2, not an iterable like it does in Python 3. Note that that article was written almost 13 years ago, five months before Python 3.0 was released. Btw, as PEP 3106 says (emphasis mine):
The original plan was to simply let .keys(), .values() and .items() return an iterator, i.e. exactly what iterkeys(), itervalues() and iteritems() return in Python 2.x.
Python 2, iteritems
gives an iterator:
>>> d = {1: 1, 2: 2, 3: 3}
>>> items = d.iteritems()
>>> items
<dictionary-itemiterator object at 0x0000000003EBA958>
>>> next(items)
(1, 1)
>>> list(items)
[(2, 2), (3, 3)]
>>> list(items)
[]
Python 3, items
gives an iterable, not an iterator:
>>> d = {1: 1, 2: 2, 3: 3}
>>> items = d.items()
>>> items
dict_items([(1, 1), (2, 2), (3, 3)])
>>> next(items)
Traceback (most recent call last):
File "<pyshell#9>", line 1, in <module>
next(items)
TypeError: 'dict_items' object is not an iterator
>>> list(items)
[(1, 1), (2, 2), (3, 3)]
>>> list(items)
[(1, 1), (2, 2), (3, 3)]
And in Python 2, with the iterator, this does cause the error:
>>> d = {1: 1, 2: 2, 3: 3}
>>> items = d.iteritems()
>>> d[4] = 4
>>> next(items)
Traceback (most recent call last):
File "<pyshell#26>", line 1, in <module>
next(items)
RuntimeError: dictionary changed size during iteration
In Python 3, if d.items()
did return an iterator, i.e., if it were equivalent to iter(d.items())
, then it would be unsafe. Because your thread might get interrupted between the iterator creation by iter()
and the consumption by list()
. But since it returns an iterable, it's the list()
function itself that internally creates an iterator from the iterable, so both the iterator creation and its consumption happen during the same single bytecode instruction (executing the list()
function).
If you change your code to list(iter(d.items()))
and increase n
to let's say 20000000
, then you'll likely get the error. Example from a run on Try it online!:
Exception in thread Thread-1:
Traceback (most recent call last):
File "/usr/lib64/python3.8/threading.py", line 932, in _bootstrap_inner
self.run()
File "/usr/lib64/python3.8/threading.py", line 870, in run
self._target(*self._args, **self._kwargs)
File ".code.tio", line 9, in dict_to_list
list(iter(d.items())) # is this safe to do?
RuntimeError: dictionary changed size during iteration
Short answer: it might be fine but use a lock anyway.
Using dis
you can see that list(d.items())
is effectively two bytecode instructions (6
and 8
):
>>> import dis
>>> dis.dis("list(d.items())")
1 0 LOAD_NAME 0 (list)
2 LOAD_NAME 1 (d)
4 LOAD_METHOD 2 (items)
6 CALL_METHOD 0
8 CALL_FUNCTION 1
10 RETURN_VALUE
On the Python FAQ it says that (generally) things implemented in C are atomic (from the point of view of a running Python program):
What kinds of global value mutation are thread-safe?
In general, Python offers to switch among threads only between bytecode instructions; [...]. Each bytecode instruction and therefore all the C implementation code reached from each instruction is therefore atomic from the point of view of a Python program.
[...]
For example, the following operations are all atomic [...]
D.keys()
list()
is implemented in C and d.items()
is implemented in C so each should be atomic, unless they end up somehow calling out to Python code (which can happen if they call out to a dunder method that you overrode using a Python implementation) or if you're using a subclass of dict
and not a real dict
or if their C implementation releases the GIL. It's not a good idea to rely on them being atomic.
You mention that iter()
will error if its underlying iterable changes size, but that's not relevant here because .keys()
, .values()
and .items()
return a view object and those have no problem with the underlying object changing:
d = {"a": 1, "b": 2}
view = d.items()
print(list(view)) # [("a", 1), ("b", 2)]
d["c"] = 3 # this could happen in a different thread
print(list(view)) # [("a", 1), ("b", 2), ("c", 3)]
If you're modifying the dict in more than one instruction at a time, you'll sometimes get d
in an inconsistent state where some of the modifications have been made and some haven't yet, but you shouldn't get a RuntimeError
like you do with iter()
, unless you modify it in a way that's non-atomic.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With