This code creates a race condition:
import threading
ITERS = 100000
x = [0]
def worker():
for _ in range(ITERS):
x[0] += 1 # this line creates a race condition
# because it takes a value, increments and then writes
# some inrcements can be done together, and lost
def main():
x[0] = 0 # you may use `global x` instead of this list trick too
t1 = threading.Thread(target=worker)
t2 = threading.Thread(target=worker)
t1.start()
t2.start()
t1.join()
t2.join()
for i in range(5):
main()
print(f'iteration {i}. expected x = {ITERS*2}, got {x[0]}')
Output:
$ python3 test.py
iteration 0. expected x = 200000, got 200000
iteration 1. expected x = 200000, got 148115
iteration 2. expected x = 200000, got 155071
iteration 3. expected x = 200000, got 200000
iteration 4. expected x = 200000, got 200000
Python3 version:
Python 3.9.7 (default, Sep 10 2021, 14:59:43)
[GCC 11.2.0] on linux
I thought GIL would prevent it and not allow two threads run together until they do something io-related or call a C library. At least this is what you may conclude from the docs.
Then, what does GIL actually do, and when do threads run in parallel?
an easy way to fix "check and act" race conditions is to synchronized keyword and enforce locking which will make this operation atomic and guarantees that block or method will only be executed by one thread and result of the operation will be visible to all threads once synchronized blocks completed or thread exited ...
There are many types of race conditions, although a common type of race condition is when two or more threads attempt to change the same data variable. NOTE: Race conditions are a real problem in Python when using threads, even in the presence of the global interpreter lock (GIL).
A condition variable allows one or more threads to wait until they are notified by another thread. If the lock argument is given and not None , it must be a Lock or RLock object, and it is used as the underlying lock. Otherwise, a new RLock object is created and used as the underlying lock.
A race condition is an undesirable situation that occurs when a device or system attempts to perform two or more operations at the same time, but because of the nature of the device or system, the operations must be done in the proper sequence to be done correctly.
Summary: in this tutorial, you’ll learn about the race conditions and how to use the Python threading Lock object to prevent them. What is a race condition? A race condition occurs when two threads try to access a shared variable simultaneously. The first thread reads the value from the shared variable.
Once a thread acquires a lock, no other thread can access the shared resource until and unless it releases it. It helps to avoid the race condition. What is a lock object in Python?
A thread is a separate flow of execution. This means that your program will have two things happening at once. But for most Python 3 implementations the different threads do not actually execute at the same time: they merely appear to.
A race condition happens when more than one thread is trying to access a shared piece of data at the same time.
Reading the docs better, I think there's the answer:
The mechanism used by the CPython interpreter to assure that only one thread executes Python bytecode at a time. This simplifies the CPython implementation by making the object model (including critical built-in types such as dict) implicitly safe against concurrent access. Locking the entire interpreter makes it easier for the interpreter to be multi-threaded, at the expense of much of the parallelism afforded by multi-processor machines.
However, some extension modules, either standard or third-party, are designed so as to release the GIL when doing computationally-intensive tasks such as compression or hashing. Also, the GIL is always released when doing I/O.
I don't know the internals, but guess each line or block of this bytecode is executed alone, and other threads are waiting (which makes it slow). But some lines consist of multiple blocks, and aren't atomic.
Here's what you get if run dis.dis('x[0] += 1')
:
0 LOAD_NAME 0 (x)
2 LOAD_CONST 0 (0)
4 DUP_TOP_TWO
6 BINARY_SUBSCR
8 LOAD_CONST 1 (1)
10 INPLACE_ADD
12 ROT_THREE
14 STORE_SUBSCR
16 LOAD_CONST 2 (None)
18 RETURN_VALUE
Some of these are executed in concurrent way, and make the race condition. So GIL only guarantees the internals of structures like list
or dict
won't be damaged.
As per our final comments, it appears as though this has been fixed (ubuntu, windows) with python version 3.10
and above. This issue is no longer experienced.
However, there other scenarios where race conditions can be obsevered. For example this:
import threading
import time
x = 10
def increment(by):
global x
local_counter = x
local_counter += by
time.sleep(1)
x = local_counter
print(f'{threading.current_thread().name} inc x {by}, x: {x}')
def main():
# creating threads
t1 = threading.Thread(target=increment, args=(5,))
t2 = threading.Thread(target=increment, args=(10,))
# starting the threads
t1.start()
t2.start()
# waiting for the threads to complete
t1.join()
t2.join()
print(f'The final value of x is {x}')
for i in range(10):
main()
which produces this:
Thread-56 (increment) inc x 10, x: 20Thread-55 (increment) inc x 5, x: 15
The final value of x is 15
Thread-57 (increment) inc x 5, x: 20Thread-58 (increment) inc x 10, x: 25
The final value of x is 25
Thread-60 (increment) inc x 10, x: 35Thread-59 (increment) inc x 5, x: 30
The final value of x is 30
Thread-61 (increment) inc x 5, x: 35
Thread-62 (increment) inc x 10, x: 40
The final value of x is 40
Thread-64 (increment) inc x 10, x: 50Thread-63 (increment) inc x 5, x: 45
The final value of x is 45
but the fix here is to use the asyncio
module to control the flow of the code.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With