Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Is it true that in multiprocessing, each process gets it's own GIL in CPython? How different is that from creating new runtimes?

Are there any caveats to it? I have a few questions related to it.

How costly is it to create more GILs? Is it any different from creating a separate python runtime? Once a new GIL is created, will it create everything (objects, variables, stack, heap) from scratch as required in that process or a copy of everything in the present heap and the stack is created? (Garbage collection would malfunction if they are working on same objects.) Are the pieces of code being executed also copied to new CPU cores? Also can i relate one GIL to one CPU core?

Now copying things is a fairly CPU intensive task (correct me if I am wrong), what would be the threshold to decide whether to go for multiprocessing?

PS: I am talking about CPython but please feel free to extend the answer to whatever you feel is necessary.

like image 585
sprksh Avatar asked Feb 15 '20 07:02

sprksh


People also ask

Does multiprocessing get around GIL?

In CPython, the Global Interpreter Lock (GIL) is a mutex that allows only one thread at a time to have the control of the Python interpreter. In other words, the lock ensures that only one thread is running at any given time. Therefore, it is impossible to take advantage of multiple processors with threads.

How does multiprocessing work with GIL?

The multiprocessing package offers both local and remote concurrency, effectively side-stepping the Global Interpreter Lock by using subprocesses instead of threads. Due to this, the multiprocessing module allows the programmer to fully leverage multiple processors on a given machine.

Why does CPython have GIL?

The GIL provides an important simplifying model of object access (including refcount manipulation) because it ensures that only one thread of execution can mutate Python objects at a time5. There are important performance benefits of the GIL for single-threaded operations as well.

How do you overcome GIL in Python?

This is achieved by preventing threads to use the Python interpreter simultaneously while they run. Use threaded extensions in C where GIL is not a problem (Numexpr, NumPy with MKL, SciPy with FFTW...): Pro: powerful and very easy to use.


1 Answers

Looking back at this question after 6 months, I feel I can clarify the doubts of my younger self. I hope this would be helpful to people who stumble upon it.

Yes, It is true that in multiprocessing module, each process has a separate GIL and there are no caveats to it. But the understanding of the runtime and GIL is flawed in the question which needs to be corrected.

I will clear the doubts/ answer the questions with a series of statements.

  1. Python code is ran (compiled to Cpython bytecode and then this bytecode interpreted) by CPython virtual machine. This is what constitutes the python runtime.
  2. When we create a new process, an entire new python virtual machine is launched (which we call the python process) with the stack and the heap memory.
  3. Yes this is a costly process but not too costly. Because python virtual machine is piece of C code precompiled to machine code. To put in perspective, the reason that in java they do not use multiprocessing is that it will create multiple JVMs which would be terrible as JVM needs a lot of memory and also, JVM is not precompiled machine code like CPython.
  4. GIL is just a piece of code within the python virtual machine which lets the CPython interpreter execute only one line of CPython bytecode (or one instruction) at a time. So, all questions related to GIL creation and cost are dumb. Basically the intention was to ask about CPython Virtual Machine.
  5. Can I relate 1 GIL to 1 CPU core? : Better to ask if 1 Python process can be related to 1 CPU core? : No. That's Kernel's job to decide what core the process is running (and which will keep changing from time to time and the process would have no control over it). The only thing is that at any give point of time, one python process cannot be running on multiple cores and one python process will execute only one instruction in CPython bytecode (due to the GIL).

What's copied in cores and how the OS tries to keep a process hold the Core it is working on is a separate ans very deep topic in itself.

The final question is a subjective one but with all this understanding, it's basically a cost to benefit ratio that may vary from program to program and might depend on how CPU intensive a process is and how many cores does the machine has etc. So that cannot be generalised.

like image 158
sprksh Avatar answered Sep 21 '22 06:09

sprksh