I am using python's <code>subprocess</code> module to start a new process. I would like to capture the output of the new process in real time so I can do things with it (display it, parse it, etc.). I have seen many examples of how this can be done, some use custom file-like objects, some use <code>threading</code> and some attempt to read the output until the process has completed. File Like Objects Example (click me) <ul> <li>I would prefer not to use custom file-like objects because I want to allow users to supply their own values for <code>stdin</code>, <code>stdout</code> and <code>stderr</code>.</li> </ul> Threading Example (click me) <ul> <li>I do not really understand why threading is required so I am reluctant to follow this example. If someone can explain why the threading example makes sense I would be happy listen. However, this example also restricts users from supplying their own <code>stdout</code> and <code>stderr</code> values.</li> </ul> Read Output Example (see below) The example which makes the most sense to me is to read the <code>stdout</code>, <code>stderr</code> until the process has finished. Here is some example code: <pre class="prettyprint"><code>import subprocess # Start a process which prints the options to the python program. process = subprocess.Popen( ["python", "-h"], bufsize=1, stdin=subprocess.PIPE, stdout=subprocess.PIPE, stderr=subprocess.PIPE, ) # While the process is running, display the output to the user. while True: # Read standard output data. for stdout_line in iter(process.stdout.readline, ""): # Display standard output data. sys.stdout.write(stdout_line) # Read standard error data. for stderr_line in iter(process.stderr.readline, ""): # Display standard error data. sys.stderr.write(stderr_line) # If the process is complete - exit loop. if process.poll() != None: break </code></pre> My question is, Q. Is there a recommended approach for capturing the output of a process using python?

First, your design is a bit silly, since you can do the same thing like this: <pre class="prettyprint"><code>process = subprocess.Popen( ["python", "-h"], bufsize=1, stdout=sys.stdout, stderr=sys.stderr ) </code></pre> … or, even better: <pre class="prettyprint"><code>process = subprocess.Popen( ["python", "-h"], bufsize=1 ) </code></pre> However, I'll assume that's just a toy example, and you might want to do something more useful. <hr> The main problem with your design is that it won't read anything from <code>stderr</code> until <code>stdout</code> is done. Imagine you're driving an MP3 player that prints each track name to stdout, and logging info to stderr, and you want to play 10 songs. Do you really want to wait 30 minutes before displaying any of the logging to your users? If that is acceptable, then you might as well just use <code>communicate</code>, which takes care of all of the headaches for you. Plus, even if it's acceptable for your model, are you sure you can queue up that much unread data in the pipe without it blocking the child? On every platform? Just breaking up the loop to alternate between the two won't help, because you could end up blocking on <code>stdout.readline()</code> for 5 minutes while <code>stderr</code> is piling up. So that's why you need some way to read from both at once. <hr> How do you read from two pipes at once? This is the same problem (but smaller) as handling 1000 network clients at once, and it has the same solutions: threading, or multiplexing (and the various hybrids, like doing green threads on top of a multiplexor and event loop, or using a threaded proactor, etc.). The best sample code for the threaded version is <code>communicate</code> from the 3.2+ source code. It's a little complicated, but if you want to handle all of the edge cases properly on both Windows and Unix there's really no avoiding a bit of complexity. For multiplexing, you can use the <code>select</code> module, but keep in mind that this only works on Unix (you can't <code>select</code> on pipes on Windows), and it's buggy without 3.2+ (or the <code>subprocess32</code> backport), and to really get all the edge cases right you need to add a signal handler to your <code>select</code>. Unless you really, really don't want to use threading, this is the harder answer. But the easy answer is to use someone else's implementation. There are a dozen or more modules on PyPI specifically for async subprocesses. Alternatively, if you already have a good reason to write your app around an event loop, just about every modern event-loop-driven async networking library (including the stdlib's <code>asyncio</code>) includes subprocess support out of the box, that works on both Unix and Windows. <hr> <blockquote> Is there a recommended approach for capturing the output of a process using python? </blockquote> It depends on who you're asking; a thousand Python developers might have a thousand different answers… or at least half a dozen. If you're asking what the core devs would recommend, I can take a guess: If you don't need to capture it asynchronously, use <code>communicate</code> (but make sure to upgrade to at least 3.2 for important bug fixes). If you do need to capture it asynchronously, use <code>asyncio</code> (which requires 3.4).

What is the best way to capture output from a process using python?

Tags:

python

subprocess

stdout

I am using python's subprocess module to start a new process. I would like to capture the output of the new process in real time so I can do things with it (display it, parse it, etc.). I have seen many examples of how this can be done, some use custom file-like objects, some use threading and some attempt to read the output until the process has completed.

File Like Objects Example (click me)

I would prefer not to use custom file-like objects because I want to allow users to supply their own values for stdin, stdout and stderr.

Threading Example (click me)

I do not really understand why threading is required so I am reluctant to follow this example. If someone can explain why the threading example makes sense I would be happy listen. However, this example also restricts users from supplying their own stdout and stderr values.

Read Output Example (see below)

The example which makes the most sense to me is to read the stdout, stderr until the process has finished. Here is some example code:

import subprocess

# Start a process which prints the options to the python program.
process = subprocess.Popen(
    ["python", "-h"],
    bufsize=1,
    stdin=subprocess.PIPE,
    stdout=subprocess.PIPE,
    stderr=subprocess.PIPE,
)    

# While the process is running, display the output to the user.
while True:

    # Read standard output data.
    for stdout_line in iter(process.stdout.readline, ""):

        # Display standard output data.
        sys.stdout.write(stdout_line)

    # Read standard error data.
    for stderr_line in iter(process.stderr.readline, ""):

        # Display standard error data.
        sys.stderr.write(stderr_line)

    # If the process is complete - exit loop.
    if process.poll() != None:
        break

My question is,

Q. Is there a recommended approach for capturing the output of a process using python?

812

asked Jan 14 '14 02:01

Yani

1 Answers

First, your design is a bit silly, since you can do the same thing like this:

process = subprocess.Popen(
                           ["python", "-h"],
                           bufsize=1,
                           stdout=sys.stdout,
                           stderr=sys.stderr
                           )

… or, even better:

process = subprocess.Popen(
                           ["python", "-h"],
                           bufsize=1
                           )

However, I'll assume that's just a toy example, and you might want to do something more useful.

The main problem with your design is that it won't read anything from stderr until stdout is done.

Imagine you're driving an MP3 player that prints each track name to stdout, and logging info to stderr, and you want to play 10 songs. Do you really want to wait 30 minutes before displaying any of the logging to your users?

If that is acceptable, then you might as well just use communicate, which takes care of all of the headaches for you.

Plus, even if it's acceptable for your model, are you sure you can queue up that much unread data in the pipe without it blocking the child? On every platform?

Just breaking up the loop to alternate between the two won't help, because you could end up blocking on stdout.readline() for 5 minutes while stderr is piling up.

So that's why you need some way to read from both at once.

How do you read from two pipes at once?

This is the same problem (but smaller) as handling 1000 network clients at once, and it has the same solutions: threading, or multiplexing (and the various hybrids, like doing green threads on top of a multiplexor and event loop, or using a threaded proactor, etc.).

The best sample code for the threaded version is communicate from the 3.2+ source code. It's a little complicated, but if you want to handle all of the edge cases properly on both Windows and Unix there's really no avoiding a bit of complexity.

For multiplexing, you can use the select module, but keep in mind that this only works on Unix (you can't select on pipes on Windows), and it's buggy without 3.2+ (or the subprocess32 backport), and to really get all the edge cases right you need to add a signal handler to your select. Unless you really, really don't want to use threading, this is the harder answer.

But the easy answer is to use someone else's implementation. There are a dozen or more modules on PyPI specifically for async subprocesses. Alternatively, if you already have a good reason to write your app around an event loop, just about every modern event-loop-driven async networking library (including the stdlib's asyncio) includes subprocess support out of the box, that works on both Unix and Windows.

Is there a recommended approach for capturing the output of a process using python?

It depends on who you're asking; a thousand Python developers might have a thousand different answers… or at least half a dozen. If you're asking what the core devs would recommend, I can take a guess:

If you don't need to capture it asynchronously, use communicate (but make sure to upgrade to at least 3.2 for important bug fixes). If you do need to capture it asynchronously, use asyncio (which requires 3.4).

151

answered Sep 19 '22 18:09

abarnert

Related questions
                            
                                Python C++ extension: compile only modified source files
                            
                                How to add/import a Django project into a virtualenv?
                            
                                Routes With Custom Domains Using Flask
                            
                                Need to convert to lambda function
                            
                                Type hinting class not yet imported
                            
                                How to use peewee limit()?
                            
                                lack of speedup and erroneous results with OpenMP and Cython
                            
                                Correlation matrix in NumPy with NaN's
                            
                                How to validate a model in django rest framework?
                            
                                What is the use of the path /usr/share/pyshared in python?
                            
                                Python module ecdsa errors while running paramiko
                            
                                tweepy (python): rate limit exceeded code 88
                            
                                OpenCV Python Linker Error
                            
                                How to pause and resume consumption gracefully in rabbitmq, pika python
                            
                                Looking for a quick way to speed up my code
                            
                                Python Pandas qcut behavior with # of observations not divisible by # of bins
                            
                                wrap long strings sublimetext python
                            
                                classmethod as constructor and inheritance
                            
                                subprocess kills child processes but not the processes the child spawns
                            
                                Launch Openstack Instances using python-boto

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With