Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

GIL behavior in python 3.7 multithreading

I am researching and trying to understand the python GIL and best practices to use multithreading in python. I found this presentation and this video

I tried to reproduce the strange and crazy problems mentioned in the first 4 slides of the presentation. This problem also mentioned by the lecturer in the video(first 4 minutes). I wrote this simple code to reproduce the problem

from threading import Thread
from time import time

BIG_NUMBER = 100000
count = BIG_NUMBER


def countdown(n):
    global count
    for i in range(n):
        count -= 1


start = time()
countdown(count)
end = time()
print('Without Threading: Final count = {final_n}, Execution Time = {exec_time}'.format(final_n=count, exec_time=end - start))

count = BIG_NUMBER
a = Thread(target=countdown, args=(BIG_NUMBER//2,))
b = Thread(target=countdown, args=(BIG_NUMBER//2,))
start = time()
a.start()
b.start()
a.join()
b.join()
end = time()
print('With Threading: Final count = {final_n}, Execution Time = {exec_time}'.format(final_n=count, exec_time=end - start))

but the results are completely different from the paper and video! executing time with threading and without threading are almost the same. sometimes one of both case is a bit faster than the other.

here is a result I got using CPython 3.7.3 under Windows 10 using a multicore architecture processor.

Without Threading: Final count = 0, Execution Time = 0.02498459815979004
With Threading: Final count = 21, Execution Time = 0.023985862731933594

also what I understand according to the video and the paper is GIL prevent real parallel execution of two thread at the same time in two core. so if this is true, Why the final count variable (in multithreading case) is not zero as expected and will be a different number at the end of each execution probably because of manipulation of threads at the same time? does anything changes happen to GIL in newer pythons than the video and the paper(which use python 3.2) cause these different? thanks in advance

like image 283
Hamid Reza Arzaghi Avatar asked Apr 03 '19 13:04

Hamid Reza Arzaghi


People also ask

What is GIL and how will it work for multithreading?

The Python Global Interpreter Lock or GIL, in simple words, is a mutex (or a lock) that allows only one thread to hold the control of the Python interpreter. This means that only one thread can be in a state of execution at any point in time.

How do you overcome GIL in Python?

This is achieved by preventing threads to use the Python interpreter simultaneously while they run. Use threaded extensions in C where GIL is not a problem (Numexpr, NumPy with MKL, SciPy with FFTW...): Pro: powerful and very easy to use.

Does multithreading in Python improve performance?

This is why Python multithreading can provide a large speed increase. The processor can switch between the threads whenever one of them is ready to do some work. Using the threading module in Python or any other interpreted language with a GIL can actually result in reduced performance.


1 Answers

Python is not executed directly. It is first compiled into so called Python bytecode. This bytecode is similar in its idea to raw assembly. The bytecode is executed.

What GIL does it doesn't allow two bytecode instructions to run in parallel. Although some opeartions (e.g. io) do release the GIL internally to allow real concurrency when it can be proved that it cannot break anything.

Now all you have to know is that count -= 1 does not compile into a single bytecode instruction. It actually compiles into 4 instructions

LOAD_GLOBAL              1 (count)
LOAD_CONST               1 (1)
INPLACE_SUBTRACT
STORE_GLOBAL             1 (count)

which roughly means

load global variable into local variable
load 1 into local variable
subtract 1 from local variable
set global to the current local variable

Each of these instruction is atomic. But the order can be mixed by threads and that's why you see what you see.

So what GIL does it makes the execution flow serial. Meaning instructions happen one after another, nothing is parallel. So when you run multiple threads in theory they will perform the same as single thread minus some time spent on (so called) context switch. My tests in Python3.6 confirm that the execution time is similar.

However in Python2.7 my tests showed significant performance degradation with threads, about 1.5x. I don't know the reason for this. Something other then GIL has to happen in the background.

like image 150
freakish Avatar answered Sep 29 '22 20:09

freakish