When calling a linux binary which takes a relatively long time through Python's subprocess
module, does this release the GIL?
I want to parallelise some code which calls a binary program from the command line. Is it better to use threads (through threading
and a multiprocessing.pool.ThreadPool
) or multiprocessing
? My assumption is that if subprocess
releases the GIL then choosing the threading
option is better.
Subprocess in Python is a module used to run new codes and applications by creating new processes. It lets you start new applications right from the Python program you are currently writing. So, if you want to run external programs from a git repository or codes from C or C++ programs, you can use subprocess in Python.
Don't expect Python 3.11 to drop the GIL just yet. Merging Sam's work back to CPython will itself be a laborious process, but is only part of what's needed: a very good backwards compatibility and migration plan for the community is needed before CPython drops the GIL. None of this is planned yet.
subprocess. Process class is not thread safe. The Concurrency and multithreading in asyncio section.
When calling a linux binary which takes a relatively long time through Python's
subprocess
module, does this release the GIL?
Yes, it releases the Global Interpreter Lock (GIL) in the calling process.
As you are likely aware, on POSIX platforms subprocess
offers convenience interfaces atop the "raw" components from fork
, execve
, and waitpid
.
By inspection of the CPython 2.7.9 sources, fork
and execve
do not release the GIL. However, those calls do not block, so we'd not expect the GIL to be released.
waitpid
of course does block, but we see it's implementation does give up the GIL using the ALLOW_THREADS macros:
static PyObject *
posix_waitpid(PyObject *self, PyObject *args)
{
....
Py_BEGIN_ALLOW_THREADS
pid = waitpid(pid, &status, options);
Py_END_ALLOW_THREADS
....
This could also be tested by calling out to some long running program like sleep from a demonstration multithreaded python script.
GIL doesn't span multiple processes. subprocess.Popen
starts a new process. If it starts a Python process then it will have its own GIL.
You don't need multiple threads (or processes created by multiprocessing
) if all you want is to run some linux binaries in parallel:
from subprocess import Popen
# start all processes
processes = [Popen(['program', str(i)]) for i in range(10)]
# now all processes run in parallel
# wait for processes to complete
for p in processes:
p.wait()
You could use multiprocessing.ThreadPool
to limit number of concurrently run programs.
Since subprocess
is for running executable (it is essentially a wrapper around os.fork()
and os.execve()
), it probably makes more sense to use it. You can use subprocess.Popen
. Something like:
import subprocess
process = subprocess.Popen(["binary"])
This will run in as a separate process, hence not being affected by the GIL. You can then use the Popen.poll()
method to check if child process has terminated:
if process.poll():
# process has finished its work
returncode = process.returncode
Just need to make sure you don't call any of the methods that wait for the process to finish its work (e.g. Popen.communicate()) to avoid your Python script blocking.
As mentioned in this answer
multiprocessing
is for running functions within your existing (Python) code with support for more flexible communications among the family of processes.multiprocessing
module is intended to provide interfaces and features which are very similar to threading while allowing CPython to scale your processing among multiple CPUs/cores despite the GIL.
So, given your use-case, subprocess
seems to be the right choice.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With