Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Does using the subprocess module release the python GIL?

When calling a linux binary which takes a relatively long time through Python's subprocess module, does this release the GIL?

I want to parallelise some code which calls a binary program from the command line. Is it better to use threads (through threading and a multiprocessing.pool.ThreadPool) or multiprocessing? My assumption is that if subprocess releases the GIL then choosing the threading option is better.

like image 837
Simon Walker Avatar asked Apr 29 '14 15:04

Simon Walker


People also ask

What does subprocess module do in Python?

Subprocess in Python is a module used to run new codes and applications by creating new processes. It lets you start new applications right from the Python program you are currently writing. So, if you want to run external programs from a git repository or codes from C or C++ programs, you can use subprocess in Python.

Will Python ever remove the GIL?

Don't expect Python 3.11 to drop the GIL just yet. Merging Sam's work back to CPython will itself be a laborious process, but is only part of what's needed: a very good backwards compatibility and migration plan for the community is needed before CPython drops the GIL. None of this is planned yet.

Is subprocess thread safe Python?

subprocess. Process class is not thread safe. The Concurrency and multithreading in asyncio section.


3 Answers

When calling a linux binary which takes a relatively long time through Python's subprocess module, does this release the GIL?

Yes, it releases the Global Interpreter Lock (GIL) in the calling process.

As you are likely aware, on POSIX platforms subprocess offers convenience interfaces atop the "raw" components from fork, execve, and waitpid.

By inspection of the CPython 2.7.9 sources, fork and execve do not release the GIL. However, those calls do not block, so we'd not expect the GIL to be released.

waitpid of course does block, but we see it's implementation does give up the GIL using the ALLOW_THREADS macros:

static PyObject *
posix_waitpid(PyObject *self, PyObject *args)
{
....
Py_BEGIN_ALLOW_THREADS
pid = waitpid(pid, &status, options);
Py_END_ALLOW_THREADS
....

This could also be tested by calling out to some long running program like sleep from a demonstration multithreaded python script.

like image 194
pilcrow Avatar answered Oct 03 '22 02:10

pilcrow


GIL doesn't span multiple processes. subprocess.Popen starts a new process. If it starts a Python process then it will have its own GIL.

You don't need multiple threads (or processes created by multiprocessing) if all you want is to run some linux binaries in parallel:

from subprocess import Popen

# start all processes
processes = [Popen(['program', str(i)]) for i in range(10)]
# now all processes run in parallel

# wait for processes to complete
for p in processes:
    p.wait()

You could use multiprocessing.ThreadPool to limit number of concurrently run programs.

like image 21
jfs Avatar answered Oct 03 '22 01:10

jfs


Since subprocess is for running executable (it is essentially a wrapper around os.fork() and os.execve()), it probably makes more sense to use it. You can use subprocess.Popen. Something like:

 import subprocess

 process = subprocess.Popen(["binary"])

This will run in as a separate process, hence not being affected by the GIL. You can then use the Popen.poll() method to check if child process has terminated:

if process.poll():
    # process has finished its work
    returncode = process.returncode

Just need to make sure you don't call any of the methods that wait for the process to finish its work (e.g. Popen.communicate()) to avoid your Python script blocking.

As mentioned in this answer

multiprocessing is for running functions within your existing (Python) code with support for more flexible communications among the family of processes. multiprocessing module is intended to provide interfaces and features which are very similar to threading while allowing CPython to scale your processing among multiple CPUs/cores despite the GIL.

So, given your use-case, subprocess seems to be the right choice.

like image 1
s16h Avatar answered Oct 03 '22 02:10

s16h