Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Why are numpy calculations not affected by the global interpreter lock?

I'm trying to decide if I should use multiprocessing or threading, and I've learned some interesting bits about the Global Interpreter Lock. In this nice blog post, it seems multithreading isn't suitable for busy tasks. However, I also learned that some functionality, such as I/O or numpy, is unaffected by the GIL.

Can anyone explain why, and how I can find out if my (probably quite numpy-heavy) code is going to be suitable for multithreading?

like image 476
Lisa Avatar asked Apr 07 '16 14:04

Lisa


People also ask

Is NumPy affected by the GIL?

GILI is a category of libraries that are not affected by the CPython GIL. NumPy, is an example I found while researching for this post. According to this post on Stack overflow, many NumPy calculations are unaffected by the GIL but not all.

What are the implications of Python global interpreter lock?

The Python Global Interpreter Lock or GIL, in simple words, is a mutex (or a lock) that allows only one thread to hold the control of the Python interpreter. This means that only one thread can be in a state of execution at any point in time.

Does Python use real threads if it uses a global interpreter lock describe with an example?

The Global Interpreter Lock In other words, the lock ensures that only one thread is running at any given time. Therefore, it is impossible to take advantage of multiple processors with threads. Since the CPython's memory management is not thread-safe, the GIL prevents race conditions and ensures thread safety.

Does NumPy automatically use multiple cores?

NumPy does not run in parallel. On the other hand Numba fully utilizes the parallel execution capabilities of your computer. NumPy functions are not going to use multiple CPU cores, never mind the GPU.


1 Answers

Many numpy calculations are unaffected by the GIL, but not all.

While in code that does not require the Python interpreter (e.g. C libraries) it is possible to specifically release the GIL - allowing other code that depends on the interpreter to continue running. In the Numpy C codebase the macros NPY_BEGIN_THREADS and NPY_END_THREADS are used to delimit blocks of code that permit GIL release. You can see these in this search of the numpy source.

The NumPy C API documentation has more information on threading support. Note the additional macros NPY_BEGIN_THREADS_DESCR, NPY_END_THREADS_DESCR and NPY_BEGIN_THREADS_THRESHOLDED which handle conditional GIL release, dependent on array dtypes and the size of loops.

Most core functions release the GIL - for example Universal Functions (ufunc) do so as described:

as long as no object arrays are involved, the Python Global Interpreter Lock (GIL) is released prior to calling the loops. It is re-acquired if necessary to handle error conditions.

With regard to your own code, the source code for NumPy is available. Check the functions you use (and the functions they call) for the above macros. Note also that the performance benefit is heavily dependent on how long the GIL is released - if your code is constantly dropping in/out of Python you won't see much of an improvement.

The other option is to just test it. However, bear in mind that functions using the conditional GIL macros may exhibit different behaviour with small and large arrays. A test with a small dataset may therefore not be an accurate representation of performance for a larger task.

There is some additional information on parallel processing with numpy available on the official wiki and a useful post about the Python GIL in general over on Programmers.SE.

like image 52
mfitzp Avatar answered Sep 17 '22 01:09

mfitzp