Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Detecting the end of the stream on popen.stdout.readline

I have a python program which launches subprocesses using Popen and consumes their output nearly real-time as it is produced. The code of the relevant loop is:

def run(self, output_consumer):
    self.prepare_to_run()
    popen_args = self.get_popen_args()
    logging.debug("Calling popen with arguments %s" % popen_args)
    self.popen = subprocess.Popen(**popen_args)
    while True:
        outdata = self.popen.stdout.readline()
        if not outdata and self.popen.returncode is not None:
            # Terminate when we've read all the output and the returncode is set
            break
        output_consumer.process_output(outdata)
        self.popen.poll()  # updates returncode so we can exit the loop
    output_consumer.finish(self.popen.returncode)
    self.post_run()

def get_popen_args(self):
    return {
        'args': self.command,
        'shell': False, # Just being explicit for security's sake
        'bufsize': 0,   # More likely to see what's being printed as it happens
                        # Not guarantted since the process itself might buffer its output
                        # run `python -u` to unbuffer output of a python processes
        'cwd': self.get_cwd(),
        'env': self.get_environment(),
        'stdout': subprocess.PIPE,
        'stderr': subprocess.STDOUT,
        'close_fds': True,  # Doesn't seem to matter
    }

This works great on my production machines, but on my dev machine, the call to .readline() hangs when certain subprocesses complete. That is, it will successfully process all of the output, including the final output line saying "process complete", but then will again poll readline and never return. This method exits properly on the dev machine for most of the sub-processes I call, but consistently fails to exit for one complex bash script that itself calls many sub-processes.

It's worth noting that popen.returncode gets set to a non-None (usually 0) value many lines before the end of the output. So I can't just break out of the loop when that is set or else I lose everything that gets spat out at the end of the process and is still buffered waiting for reading. The problem is that when I'm flushing the buffer at that point, I can't tell when I'm at the end because the last call to readline() hangs. Calling read() also hangs. Calling read(1) gets me every last character out, but also hangs after the final line. popen.stdout.closed is always False. How can I tell when I'm at the end?

All systems are running python 2.7.3 on Ubuntu 12.04LTS. FWIW, stderr is being merged with stdout using stderr=subprocess.STDOUT.

Why the difference? Is it failing to close stdout for some reason? Could the sub-sub-processes do something to keep it open somehow? Could it be because I'm launching the process from a terminal on my dev box, but in production it's launched as a daemon through supervisord? Would that change the way the pipes are processed and if so how do I normalize them?

like image 530
Leopd Avatar asked Feb 13 '13 16:02

Leopd


2 Answers

The main code loop looks right. It could be that the pipe isn't closing because another process is keeping it open. For example, if script launches a background process that writes to stdout then the pipe will no close. Are you sure no other child process still running?

An idea is to change modes when you see the .returncode has set. Once you know the main process is done, read all its output from buffer, but don't get stuck waiting. You can use select to read from the pipe with a timeout. Set a several seconds timeout and you can clear the buffer without getting stuck waiting child process.

like image 175
muudscope Avatar answered Sep 29 '22 06:09

muudscope


Without knowing the contents of the "one complex bash script" which causes the problem, there's too many possibilities to determine the exact cause.

However, focusing on the fact that you claim it works if you run your Python script under supervisord, then it might be getting stuck if a sub-process is trying to read from stdin, or just behaves differently if stdin is a tty, which (I presume) supervisord will redirect from /dev/null.

This minimal example seems to cope better with cases where my example test.sh runs subprocesses which try to read from stdin...

import os
import subprocess

f = subprocess.Popen(args='./test.sh',
                     shell=False,
                     bufsize=0,
                     stdin=open(os.devnull, 'rb'),
                     stdout=subprocess.PIPE,
                     stderr=subprocess.STDOUT,
                     close_fds=True)

while 1:
    s = f.stdout.readline()
    if not s and f.returncode is not None:
        break
    print s.strip()
    f.poll()
print "done %d" % f.returncode

Otherwise, you can always fall back to using a non-blocking read, and bail out when you get your final output line saying "process complete", although it's a bit of a hack.

like image 43
Aya Avatar answered Sep 29 '22 05:09

Aya