Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Python's Popen cleanup

I wanted to use a python equivalent to piping some shell commands in perl. Something like the python version of open(PIPE, "command |").

I go to the subprocess module and try this:

p = subprocess.Popen("zgrep thingiwant largefile", shell=True, stdout=subprocess.PIPE)

This works for reading the output the same way I would in perl, but it doesn't clean itself up. When I exit the interpreter, I get

grep: writing output: Broken pipe

spewed all over stderr a few million times. I guess I had naively hoped all this would be taken care of for me, but that's not true. Calling terminate or kill on p doesn't seem to help. Look at the process table, I see that this kills the /bin/sh process, but leaves the child gzip in place to complain about the broken pipe.

What's the right way to do this?

like image 598
pythonic metaphor Avatar asked Apr 07 '10 20:04

pythonic metaphor


People also ask

What is Popen in Python?

Python method popen() opens a pipe to or from command. The return value is an open file object connected to the pipe, which can be read or written depending on whether mode is 'r' (default) or 'w'.

Does subprocess Popen block?

Popen is nonblocking. call and check_call are blocking. You can make the Popen instance block by calling its wait or communicate method.

What is Popen command?

DESCRIPTION. The popen() function shall execute the command specified by the string command. It shall create a pipe between the calling program and the executed command, and shall return a pointer to a stream that can be used to either read from or write to the pipe.


2 Answers

The issue is that the pipe is full. The subprocess stops, waiting for the pipe to empty out, but then your process (the Python interpreter) quits, breaking its end of the pipe (hence the error message).

p.wait() will not help you:

Warning This will deadlock if the child process generates enough output to a stdout or stderr pipe such that it blocks waiting for the OS pipe buffer to accept more data. Use communicate() to avoid that.

http://docs.python.org/library/subprocess.html#subprocess.Popen.wait

p.communicate() will not help you:

Note The data read is buffered in memory, so do not use this method if the data size is large or unlimited.

http://docs.python.org/library/subprocess.html#subprocess.Popen.communicate

p.stdout.read(num_bytes) will not help you:

Warning Use communicate() rather than .stdin.write, .stdout.read or .stderr.read to avoid deadlocks due to any of the other OS pipe buffers filling up and blocking the child process.

http://docs.python.org/library/subprocess.html#subprocess.Popen.stdout

The moral of the story is, for large output, subprocess.PIPE will doom you to certain failure if your program is trying to read the data (it seems to me that you should be able to put p.stdout.read(bytes) into a while p.returncode is None: loop, but the above warning suggests that this could deadlock).

The docs suggest replacing a shell pipe with this:

p1 = Popen(["zgrep", "thingiwant", "largefile"], stdout=PIPE)
p2 = Popen(["processreceivingdata"], stdin=p1.stdout, stdout=PIPE)
output = p2.communicate()[0]

Notice that p2 is taking its standard input directly from p1. This should avoid deadlocks, but given the contradictory warnings above, who knows.

Anyway, if that last part doesn't work for you (it should, though), you could try creating a temporary file, writing all data from the first call to that, and then using the temporary file as input to the next process.

like image 115
Daniel G Avatar answered Oct 10 '22 00:10

Daniel G


After you open the pipe, you can work with the command output: p.stdout:

for line in p.stdout:
    # do stuff
p.stdout.close()
like image 44
tzot Avatar answered Oct 09 '22 22:10

tzot