In the python subprocess module, if we wanted to run the shell command
foo | grep bar
from within python, we might use
p1 = Popen(["foo"], stdout = PIPE)
p2 = Popen(["grep", "bar"], stdin = p1.stdout, stdout = PIPE)
p1.stdout.close()
output = p2.communicate()[0]
I'm confused about the line p1.stdout.close()
. If you'll forgive me, I'll trace through how I think the program works, and the error will hopefully reveal itself.
It seems to me that when the line output = p2.communicate()[0]
is enacted by python, python tries to call p2
, it recognizes that it needs output from p1
. So it calls p1
, which executes foo
and throws the output on the stack so that p2
can finish executing. And then p2
finishes.
But nowhere in this trace does p1.stdout.close()
actually happen. So what is actually happening? It seems to me that this ordering of lines might matter too, so that the following wouldn't work:
p1 = Popen(["foo"], stdout = PIPE)
p1.stdout.close()
p2 = Popen(["grep", "bar"], stdin = p1.stdout, stdout = PIPE)
output = p2.communicate()[0]
And that's the status of my understanding.
p1.stdout.close()
is necessary for foo
to detect when the pipe is broken e.g., when p2
exits prematurely.
If there is no p1.stdout.close()
then p1.stdout
remains open in the parent process and even if p2
exits; p1
won't know that nobody reads p1.stdout
i.e., p1
will continue to write to p1.stdout
until the corresponding OS pipe buffer is full and then it just blocks forever.
To emulate foo | grep bar
shell command without a shell:
#!/usr/bin/env python3
from subprocess import Popen, PIPE
with Popen(['grep', 'bar'], stdin=PIPE) as grep, \
Popen(['foo'], stdout=grep.stdin):
grep.communicate()
See How do I use subprocess.Popen to connect multiple processes by pipes?
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With