Some discussion in another question has encouraged me to to better understand cases where locking is required in multithreaded Python programs.
Per this article on threading in Python, I have several solid, testable examples of pitfalls that can occur when multiple threads access shared state. The example race condition provided on this page involves races between threads reading and manipulating a shared variable stored in a dictionary. I think the case for a race here is very obvious, and fortunately is eminently testable.
However, I have been unable to evoke a race condition with atomic operations such as list appends or variable increments. This test exhaustively attempts to demonstrate such a race:
from threading import Thread, Lock
import operator
def contains_all_ints(l, n):
l.sort()
for i in xrange(0, n):
if l[i] != i:
return False
return True
def test(ntests):
results = []
threads = []
def lockless_append(i):
results.append(i)
for i in xrange(0, ntests):
threads.append(Thread(target=lockless_append, args=(i,)))
threads[i].start()
for i in xrange(0, ntests):
threads[i].join()
if len(results) != ntests or not contains_all_ints(results, ntests):
return False
else:
return True
for i in range(0,100):
if test(100000):
print "OK", i
else:
print "appending to a list without locks *is* unsafe"
exit()
I have run the test above without failure (100x 100k multithreaded appends). Can anyone get it to fail? Is there another class of object which can be made to misbehave via atomic, incremental, modification by threads?
Do these implicitly 'atomic' semantics apply to other operations in Python? Is this directly related to the GIL?
When working with threads in Python, you will find very useful to be able to share data between different tasks. One of the advantages of threads in Python is that they share the same memory space, and thus exchanging information is relatively easy.
You can protect data variables shared between threads using a threading. Lock mutex lock, and you can share data between threads explicitly using queue.
Python is not by its self thread safe. But there are moves to change this: NoGil, etc. Removing the GIL does not make functions thread-safe.
In fact, a Python process cannot run threads in parallel but it can run them concurrently through context switching during I/O bound operations. This limitation is actually enforced by GIL. The Python Global Interpreter Lock (GIL) prevents threads within the same process to be executed at the same time.
Appending to a list is thread-safe, yes. You can only append to a list while holding the GIL, and the list takes care not to release the GIL during the append
operation (which is, after all, a fairly simple operation.) The order in which different thread's append operations go through is of course up for grabs, but they will all be strictly serialized operations because the GIL is never released during an append.
The same is not necessarily true for other operations. Lots of operations in Python can cause arbitrary Python code to be executed, which in turn can cause the GIL to be released. For example, i += 1
is three distinct operations, "get i
', "add 1 to it" and "store it in i
". "add 1 to it" would translate (in this case) into it.__iadd__(1)
, which can go off and do whatever it likes.
Python objects themselves guard their own internal state -- dicts won't get corrupted by two different threads trying to set items in them. But if the data in the dict is supposed to be internally consistent, neither the dict nor the GIL does anything to protect that, except (in usual thread fashion) by making it less likely but still possible things end up different than you thought.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With