From the docs using Popen.wait() may:
deadlock when using stdout=PIPE and/or stderr=PIPE and the child process generates enough output to a pipe such that it blocks waiting for the OS pipe buffer to accept more data. Use communicate() to avoid that.
In communicate docs it's written that:
The data read is buffered in memory, so do not use this method if the data size is large or unlimited
How to reproduce such a problematic behavior and see that using Popen.communicate() fixes it?
Deadlock means some circular wait between processes holding resources happens and is stuck forever. What is the circular dependency here? Python process which waits for the child process to terminate is one wait. What's the other? Who waits for what in the below scenario?
it blocks waiting for the OS pipe buffer to accept more data
It's easy.
Create a process which outputs a lot of text and don't read the output:
p = subprocess.Popen(["ls","-R"],stdout=subprocess.PIPE)
p.wait()
after a while the standard output pipe is full and process is blocked.
It's a deadlock situation because the subprocess cannot write anymore to the output until it's consumed (which is: never), and the python process waits for the subprocess to finish.
To avoid the deadlock, you could use a read line loop:
p = subprocess.Popen(["ls","-R"],stdout=subprocess.PIPE)
for line in p.stdout:
# do something with the line
p.wait()
communicate
also fixes that but also fixes the much trickier case where both output and error streams are redirected to separate streams (in that case, the naive loop above could still deadlock).
Let's suppose you have a compilation process
p = subprocess.Popen(["gcc","-c"]+mega_list_of_files,stdout=subprocess.PIPE,stderr=subprocess.PIPE)
Now you want to get the output from this one, so you do:
output = p.stdout.read()
unfortunately, a lot of errors pop up instead, blocking the error stream while you're reading the output stream: deadlock again.
Try to read error stream instead, and the exact opposite could occur: lots of stdout output blocking your process.
communicate
uses multithreading to be able to process output & error streams at the same time and keep them separated, without risk of blocking. Only caveat is that you cannot control the process output line by line / print program output in real time:
p = subprocess.Popen(["gcc","-c"]+mega_list_of_files,stdout=subprocess.PIPE,stderr=subprocess.PIPE)
output,error = p.communicate()
return_code = p.wait()
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With