I am doing heavy image processing in python3 on a large batch of images using numpy and opencv. I know python has this GIL which prevents two threads running concurrently. A quick search on Google told me that, do not use threads in python for CPU intensive tasks, use them only for I/O or saving files to disk, database communication etc. I also read that GIL is released when working with C extensions. Since both numpy and opencv are C and C++ extensions I get a feeling that GIL might be released.I am not sure about it because image processing is a CPU intensive task. Is my intuition correct or I am better of using multiprocessing?
To answer it upfront, it depends on the functions you use.
The most effective way to prove if a function releases the GIL is by checking the corresponding source. Also checking the documentation helps, but often it is simply not documented. And yes, it is cumbersome.
http://scipy-cookbook.readthedocs.io/items/Multithreading.html
[...] numpy code often releases the GIL while it is calculating, so that simple parallelism can speed up the code.
Each project might use their own macro, so if you are familiar with the default macros like Py_BEGIN_ALLOW_THREADS from the C Python API, you might find them being redefined. In Numpy for instance it would be NPY_BEGIN_THREADS_DEF
, etc.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With