Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

What is the best way to capture output from a process using python?

I am using python's subprocess module to start a new process. I would like to capture the output of the new process in real time so I can do things with it (display it, parse it, etc.). I have seen many examples of how this can be done, some use custom file-like objects, some use threading and some attempt to read the output until the process has completed.

File Like Objects Example (click me)

  • I would prefer not to use custom file-like objects because I want to allow users to supply their own values for stdin, stdout and stderr.

Threading Example (click me)

  • I do not really understand why threading is required so I am reluctant to follow this example. If someone can explain why the threading example makes sense I would be happy listen. However, this example also restricts users from supplying their own stdout and stderr values.

Read Output Example (see below)

The example which makes the most sense to me is to read the stdout, stderr until the process has finished. Here is some example code:

import subprocess

# Start a process which prints the options to the python program.
process = subprocess.Popen(
    ["python", "-h"],
    bufsize=1,
    stdin=subprocess.PIPE,
    stdout=subprocess.PIPE,
    stderr=subprocess.PIPE,
)    

# While the process is running, display the output to the user.
while True:

    # Read standard output data.
    for stdout_line in iter(process.stdout.readline, ""):

        # Display standard output data.
        sys.stdout.write(stdout_line)

    # Read standard error data.
    for stderr_line in iter(process.stderr.readline, ""):

        # Display standard error data.
        sys.stderr.write(stderr_line)

    # If the process is complete - exit loop.
    if process.poll() != None:
        break

My question is,

Q. Is there a recommended approach for capturing the output of a process using python?

like image 812
Yani Avatar asked Jan 14 '14 02:01

Yani


People also ask

How do you capture standard output in Python?

When you use print() in python the output goes to standard output or sys. stdout . You can directly call sys. stdout.

How do I capture the output of a subprocess run?

To capture the output of the subprocess. run method, use an additional argument named “capture_output=True”. You can individually access stdout and stderr values by using “output. stdout” and “output.

How do you record stdout output from a Python function call?

To capture stdout output from a Python function call, we can use the redirect_stdout function. to call redirect_stdout with the f StringIO object. Then we call do_something which prints stuff to stdout. And then we get the value printed to stdout with f.

What does subprocess pipe do?

In Python, the subprocess module allows you to execute linux/unix commands directly from Python. The official Python documentation is very useful would you need to go further.


1 Answers

First, your design is a bit silly, since you can do the same thing like this:

process = subprocess.Popen(
                           ["python", "-h"],
                           bufsize=1,
                           stdout=sys.stdout,
                           stderr=sys.stderr
                           )

… or, even better:

process = subprocess.Popen(
                           ["python", "-h"],
                           bufsize=1
                           )

However, I'll assume that's just a toy example, and you might want to do something more useful.


The main problem with your design is that it won't read anything from stderr until stdout is done.

Imagine you're driving an MP3 player that prints each track name to stdout, and logging info to stderr, and you want to play 10 songs. Do you really want to wait 30 minutes before displaying any of the logging to your users?

If that is acceptable, then you might as well just use communicate, which takes care of all of the headaches for you.

Plus, even if it's acceptable for your model, are you sure you can queue up that much unread data in the pipe without it blocking the child? On every platform?

Just breaking up the loop to alternate between the two won't help, because you could end up blocking on stdout.readline() for 5 minutes while stderr is piling up.

So that's why you need some way to read from both at once.


How do you read from two pipes at once?

This is the same problem (but smaller) as handling 1000 network clients at once, and it has the same solutions: threading, or multiplexing (and the various hybrids, like doing green threads on top of a multiplexor and event loop, or using a threaded proactor, etc.).

The best sample code for the threaded version is communicate from the 3.2+ source code. It's a little complicated, but if you want to handle all of the edge cases properly on both Windows and Unix there's really no avoiding a bit of complexity.

For multiplexing, you can use the select module, but keep in mind that this only works on Unix (you can't select on pipes on Windows), and it's buggy without 3.2+ (or the subprocess32 backport), and to really get all the edge cases right you need to add a signal handler to your select. Unless you really, really don't want to use threading, this is the harder answer.

But the easy answer is to use someone else's implementation. There are a dozen or more modules on PyPI specifically for async subprocesses. Alternatively, if you already have a good reason to write your app around an event loop, just about every modern event-loop-driven async networking library (including the stdlib's asyncio) includes subprocess support out of the box, that works on both Unix and Windows.


Is there a recommended approach for capturing the output of a process using python?

It depends on who you're asking; a thousand Python developers might have a thousand different answers… or at least half a dozen. If you're asking what the core devs would recommend, I can take a guess:

If you don't need to capture it asynchronously, use communicate (but make sure to upgrade to at least 3.2 for important bug fixes). If you do need to capture it asynchronously, use asyncio (which requires 3.4).

like image 151
abarnert Avatar answered Sep 19 '22 18:09

abarnert