Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Python: select() doesn't signal all input from pipe

I am trying to load an external command line program with Python and communicate with it via pipes. The progam takes text input via stdin and produces text output in lines to stdout. Communication should be asynchronous using select().

The problem is, that not all output of the program is signalled in select(). Usually the last one or two lines are not signalled. If select() returns with a timeout and I am trying to read from the pipe anyway readline() returns immediately with the line sent from the program. See code below.

The program doesn't buffer the output and sends all output in text lines. Connecting to the program via pipes in many other languages and environments has worked fine so far.

I have tried Python 3.1 and 3.2 on Mac OSX 10.6.

import subprocess
import select

engine = subprocess.Popen("Engine", bufsize=0, stdin=subprocess.PIPE, stdout=subprocess.PIPE)
engine.stdin.write(b"go\n")
engine.stdin.flush()

while True:
    inputready,outputready,exceptready = select.select( [engine.stdout.fileno()] , [], [], 10.0)

    if (inputready, outputready, exceptready) == ([], [], []):
        print("trying to read from engine anyway...")
        line = engine.stdout.readline()
        print(line)

     for s in inputready:
        line = engine.stdout.readline()
        print(line)
like image 326
StefanMK Avatar asked Mar 30 '11 13:03

StefanMK


1 Answers

Note that internally file.readlines([size]) loops and invokes the read() syscall more than once, attempting to fill an internal buffer of size. The first call to read() will immediately return, since select() indicated the fd was readable. However the 2nd call will block until data is available, which defeats the purpose of using select. In any case it is tricky to use file.readlines([size]) in an asynchronous app.

You should call os.read(fd, size) once on each fd for every pass through select. This performs a non-blocking read, and lets you buffer partial lines until data is available and detects EOF unambiguously.

I modified your code to illustrate using os.read. It also reads from the process' stderr:

import os
import select
import subprocess
from cStringIO import StringIO

target = 'Engine'
PIPE = subprocess.PIPE
engine = subprocess.Popen(target, bufsize=0, stdin=PIPE, stdout=PIPE, stderr=PIPE)
engine.stdin.write(b"go\n")
engine.stdin.flush()

class LineReader(object):

    def __init__(self, fd):
        self._fd = fd
        self._buf = ''

    def fileno(self):
        return self._fd

    def readlines(self):
        data = os.read(self._fd, 4096)
        if not data:
            # EOF
            return None
        self._buf += data
        if '\n' not in data:
            return []
        tmp = self._buf.split('\n')
        lines, self._buf = tmp[:-1], tmp[-1]
        return lines

proc_stdout = LineReader(engine.stdout.fileno())
proc_stderr = LineReader(engine.stderr.fileno())
readable = [proc_stdout, proc_stderr]

while readable:
    ready = select.select(readable, [], [], 10.0)[0]
    if not ready:
        continue
    for stream in ready:
        lines = stream.readlines()
        if lines is None:
            # got EOF on this stream
            readable.remove(stream)
            continue
        for line in lines:
            print line
like image 158
samplebias Avatar answered Oct 26 '22 00:10

samplebias