Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Why does this Python code with threading have race conditions?

This code creates a race condition:

import threading

ITERS = 100000
x = [0]

def worker():
    for _ in range(ITERS):
        x[0] += 1  # this line creates a race condition
        # because it takes a value, increments and then writes
        # some inrcements can be done together, and lost

def main():
    x[0] = 0  # you may use `global x` instead of this list trick too
    t1 = threading.Thread(target=worker)
    t2 = threading.Thread(target=worker)
    t1.start()
    t2.start()
    t1.join()
    t2.join()

for i in range(5):
    main()
    print(f'iteration {i}. expected x = {ITERS*2}, got {x[0]}')

Output:

$ python3 test.py
iteration 0. expected x = 200000, got 200000
iteration 1. expected x = 200000, got 148115
iteration 2. expected x = 200000, got 155071
iteration 3. expected x = 200000, got 200000
iteration 4. expected x = 200000, got 200000

Python3 version:

Python 3.9.7 (default, Sep 10 2021, 14:59:43) 
[GCC 11.2.0] on linux

I thought GIL would prevent it and not allow two threads run together until they do something io-related or call a C library. At least this is what you may conclude from the docs.

Then, what does GIL actually do, and when do threads run in parallel?

like image 313
culebrón Avatar asked Dec 27 '21 08:12

culebrón


People also ask

How do you fix race condition in a thread?

an easy way to fix "check and act" race conditions is to synchronized keyword and enforce locking which will make this operation atomic and guarantees that block or method will only be executed by one thread and result of the operation will be visible to all threads once synchronized blocks completed or thread exited ...

Does Python have race conditions?

There are many types of race conditions, although a common type of race condition is when two or more threads attempt to change the same data variable. NOTE: Race conditions are a real problem in Python when using threads, even in the presence of the global interpreter lock (GIL).

What is threading condition in Python?

A condition variable allows one or more threads to wait until they are notified by another thread. If the lock argument is given and not None , it must be a Lock or RLock object, and it is used as the underlying lock. Otherwise, a new RLock object is created and used as the underlying lock.

What is a race condition Why is it undesirable when programming threads?

A race condition is an undesirable situation that occurs when a device or system attempts to perform two or more operations at the same time, but because of the nature of the device or system, the operations must be done in the proper sequence to be done correctly.

What is a race condition in Python threading?

Summary: in this tutorial, you’ll learn about the race conditions and how to use the Python threading Lock object to prevent them. What is a race condition? A race condition occurs when two threads try to access a shared variable simultaneously. The first thread reads the value from the shared variable.

What happens when a thread acquires a lock in Python?

Once a thread acquires a lock, no other thread can access the shared resource until and unless it releases it. It helps to avoid the race condition. What is a lock object in Python?

What is a thread in Python?

A thread is a separate flow of execution. This means that your program will have two things happening at once. But for most Python 3 implementations the different threads do not actually execute at the same time: they merely appear to.

What is a race condition in Java?

A race condition happens when more than one thread is trying to access a shared piece of data at the same time.


Video Answer


2 Answers

Reading the docs better, I think there's the answer:

The mechanism used by the CPython interpreter to assure that only one thread executes Python bytecode at a time. This simplifies the CPython implementation by making the object model (including critical built-in types such as dict) implicitly safe against concurrent access. Locking the entire interpreter makes it easier for the interpreter to be multi-threaded, at the expense of much of the parallelism afforded by multi-processor machines.

However, some extension modules, either standard or third-party, are designed so as to release the GIL when doing computationally-intensive tasks such as compression or hashing. Also, the GIL is always released when doing I/O.

I don't know the internals, but guess each line or block of this bytecode is executed alone, and other threads are waiting (which makes it slow). But some lines consist of multiple blocks, and aren't atomic.

Here's what you get if run dis.dis('x[0] += 1'):

          0 LOAD_NAME                0 (x)
          2 LOAD_CONST               0 (0)
          4 DUP_TOP_TWO
          6 BINARY_SUBSCR
          8 LOAD_CONST               1 (1)
         10 INPLACE_ADD
         12 ROT_THREE
         14 STORE_SUBSCR
         16 LOAD_CONST               2 (None)
         18 RETURN_VALUE

Some of these are executed in concurrent way, and make the race condition. So GIL only guarantees the internals of structures like list or dict won't be damaged.

like image 157
culebrón Avatar answered Nov 10 '22 10:11

culebrón


As per our final comments, it appears as though this has been fixed (ubuntu, windows) with python version 3.10 and above. This issue is no longer experienced.

However, there other scenarios where race conditions can be obsevered. For example this:

import threading
import time
 
x = 10
 
def increment(by):
    global x
 
    local_counter = x
    local_counter += by
 
    time.sleep(1)
 
    x = local_counter
    print(f'{threading.current_thread().name} inc x {by}, x: {x}')
 
def main():
    # creating threads
    t1 = threading.Thread(target=increment, args=(5,))
    t2 = threading.Thread(target=increment, args=(10,))
   
    # starting the threads
    t1.start()
    t2.start()
   
    # waiting for the threads to complete
    t1.join()
    t2.join()
   
    print(f'The final value of x is {x}')
 
for i in range(10):
    main()

which produces this:

Thread-56 (increment) inc x 10, x: 20Thread-55 (increment) inc x 5, x: 15
 
The final value of x is 15
Thread-57 (increment) inc x 5, x: 20Thread-58 (increment) inc x 10, x: 25
 
The final value of x is 25
Thread-60 (increment) inc x 10, x: 35Thread-59 (increment) inc x 5, x: 30
 
The final value of x is 30
Thread-61 (increment) inc x 5, x: 35
Thread-62 (increment) inc x 10, x: 40
The final value of x is 40
Thread-64 (increment) inc x 10, x: 50Thread-63 (increment) inc x 5, x: 45
 
The final value of x is 45

but the fix here is to use the asyncio module to control the flow of the code.

like image 33
D.L Avatar answered Nov 10 '22 11:11

D.L