I am using python's subprocess
module to start a new process. I would like to capture the output of the new process in real time so I can do things with it (display it, parse it, etc.). I have seen many examples of how this can be done, some use custom file-like objects, some use threading
and some attempt to read the output until the process has completed.
File Like Objects Example (click me)
stdin
, stdout
and stderr
.Threading Example (click me)
stdout
and stderr
values.Read Output Example (see below)
The example which makes the most sense to me is to read the stdout
, stderr
until the process has finished. Here is some example code:
import subprocess
# Start a process which prints the options to the python program.
process = subprocess.Popen(
["python", "-h"],
bufsize=1,
stdin=subprocess.PIPE,
stdout=subprocess.PIPE,
stderr=subprocess.PIPE,
)
# While the process is running, display the output to the user.
while True:
# Read standard output data.
for stdout_line in iter(process.stdout.readline, ""):
# Display standard output data.
sys.stdout.write(stdout_line)
# Read standard error data.
for stderr_line in iter(process.stderr.readline, ""):
# Display standard error data.
sys.stderr.write(stderr_line)
# If the process is complete - exit loop.
if process.poll() != None:
break
My question is,
Q. Is there a recommended approach for capturing the output of a process using python?
When you use print() in python the output goes to standard output or sys. stdout . You can directly call sys. stdout.
To capture the output of the subprocess. run method, use an additional argument named “capture_output=True”. You can individually access stdout and stderr values by using “output. stdout” and “output.
To capture stdout output from a Python function call, we can use the redirect_stdout function. to call redirect_stdout with the f StringIO object. Then we call do_something which prints stuff to stdout. And then we get the value printed to stdout with f.
In Python, the subprocess module allows you to execute linux/unix commands directly from Python. The official Python documentation is very useful would you need to go further.
First, your design is a bit silly, since you can do the same thing like this:
process = subprocess.Popen(
["python", "-h"],
bufsize=1,
stdout=sys.stdout,
stderr=sys.stderr
)
… or, even better:
process = subprocess.Popen(
["python", "-h"],
bufsize=1
)
However, I'll assume that's just a toy example, and you might want to do something more useful.
The main problem with your design is that it won't read anything from stderr
until stdout
is done.
Imagine you're driving an MP3 player that prints each track name to stdout, and logging info to stderr, and you want to play 10 songs. Do you really want to wait 30 minutes before displaying any of the logging to your users?
If that is acceptable, then you might as well just use communicate
, which takes care of all of the headaches for you.
Plus, even if it's acceptable for your model, are you sure you can queue up that much unread data in the pipe without it blocking the child? On every platform?
Just breaking up the loop to alternate between the two won't help, because you could end up blocking on stdout.readline()
for 5 minutes while stderr
is piling up.
So that's why you need some way to read from both at once.
How do you read from two pipes at once?
This is the same problem (but smaller) as handling 1000 network clients at once, and it has the same solutions: threading, or multiplexing (and the various hybrids, like doing green threads on top of a multiplexor and event loop, or using a threaded proactor, etc.).
The best sample code for the threaded version is communicate
from the 3.2+ source code. It's a little complicated, but if you want to handle all of the edge cases properly on both Windows and Unix there's really no avoiding a bit of complexity.
For multiplexing, you can use the select
module, but keep in mind that this only works on Unix (you can't select
on pipes on Windows), and it's buggy without 3.2+ (or the subprocess32
backport), and to really get all the edge cases right you need to add a signal handler to your select
. Unless you really, really don't want to use threading, this is the harder answer.
But the easy answer is to use someone else's implementation. There are a dozen or more modules on PyPI specifically for async subprocesses. Alternatively, if you already have a good reason to write your app around an event loop, just about every modern event-loop-driven async networking library (including the stdlib's asyncio
) includes subprocess support out of the box, that works on both Unix and Windows.
Is there a recommended approach for capturing the output of a process using python?
It depends on who you're asking; a thousand Python developers might have a thousand different answers… or at least half a dozen. If you're asking what the core devs would recommend, I can take a guess:
If you don't need to capture it asynchronously, use communicate
(but make sure to upgrade to at least 3.2 for important bug fixes). If you do need to capture it asynchronously, use asyncio
(which requires 3.4).
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With