Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How can I feed a subprocess's standard input from a Python iterator?

I am trying to use the subprocess module in Python to communicate with a process that reads standard input and writes standard output in a streaming fashion. I want to have the subprocess read lines from an iterator that produces the input, and then read output lines from the subprocess. There may not be a one-to-one correspondence between input and output lines. How can I feed a subprocess from an arbitrary iterator that returns strings?

Here is some example code that gives a simple test case, and some methods I have tried that don't work for some reason or other:

#!/usr/bin/python
from subprocess import *
# A really big iterator
input_iterator = ("hello %s\n" % x for x in xrange(100000000))

# I thought that stdin could be any iterable, but it actually wants a
# filehandle, so this fails with an error.
subproc = Popen("cat", stdin=input_iterator, stdout=PIPE)

# This works, but it first sends *all* the input at once, then returns
# *all* the output as a string, rather than giving me an iterator over
# the output. This uses up all my memory, because the input is several
# hundred million lines.
subproc = Popen("cat", stdin=PIPE, stdout=PIPE)
output, error = subproc.communicate("".join(input_iterator))
output_lines = output.split("\n")

So how can I have my subprocess read from an iterator line by line while I read from its stdout line by line?

like image 522
Ryan C. Thompson Avatar asked Jul 31 '11 22:07

Ryan C. Thompson


1 Answers

The easy way seems to be to fork and feed the input handle from the child process. Can anyone elaborate on any possible downsides of doing this? Or are there python modules that make it easier and safer?

#!/usr/bin/python
from subprocess import *
import os

def fork_and_input(input, handle):
    """Send input to handle in a child process."""
    # Make sure input is iterable before forking
    input = iter(input)
    if os.fork():
        # Parent
        handle.close()
    else:
        # Child
        try:
            handle.writelines(input)
            handle.close()
        # An IOError here means some *other* part of the program
        # crashed, so don't complain here.
        except IOError:
            pass
        os._exit()

# A really big iterator
input_iterator = ("hello %s\n" % x for x in xrange(100000000))

subproc = Popen("cat", stdin=PIPE, stdout=PIPE)
fork_and_input(input_iterator, subproc.stdin)

for line in subproc.stdout:
    print line,
like image 87
Ryan C. Thompson Avatar answered Sep 30 '22 12:09

Ryan C. Thompson