pipe large amount of data to stdin while using subprocess.Popen

Tags:

I'm kind of struggling to understand what is the python way of solving this simple problem.

My problem is quite simple. If you use the follwing code it will hang. This is well documented in the subprocess module doc.

import subprocess

proc = subprocess.Popen(['cat','-'],
                        stdin=subprocess.PIPE,
                        stdout=subprocess.PIPE,
                        )
for i in range(100000):
    proc.stdin.write('%d\n' % i)
output = proc.communicate()[0]
print output

Searching for a solution (there is a very insightful thread, but I've lost it now) I found this solution (among others) that uses an explicit fork:

import os
import sys
from subprocess import Popen, PIPE

def produce(to_sed):
    for i in range(100000):
        to_sed.write("%d\n" % i)
        to_sed.flush()
    #this would happen implicitly, anyway, but is here for the example
    to_sed.close()

def consume(from_sed):
    while 1:
        res = from_sed.readline()
        if not res:
            sys.exit(0)
            #sys.exit(proc.poll())
        print 'received: ', [res]

def main():
    proc = Popen(['cat','-'],stdin=PIPE,stdout=PIPE)
    to_sed = proc.stdin
    from_sed = proc.stdout

    pid = os.fork()
    if pid == 0 :
        from_sed.close()
        produce(to_sed)
        return
    else :
        to_sed.close()
        consume(from_sed)

if __name__ == '__main__':
    main()

While this solution is conceptually very easy to understand, it uses one more process and stuck as too low level compared to the subprocess module (that is there just to hide this kind of things...).

I'm wondering: is there a simple and clean solution using the subprocess module that won't hung or to implement this patter I have to do a step back and implement an old-style select loop or an explicit fork?

Thanks

778

asked May 06 '11 12:05

pietro abate

Video Answer

1 Answers

If you want a pure Python solution, you need to put either the reader or the writer in a separate thread. The threading package is a lightweight way to do this, with convenient access to common objects and no messy forking.

import subprocess
import threading
import sys

proc = subprocess.Popen(['cat','-'],
                        stdin=subprocess.PIPE,
                        stdout=subprocess.PIPE,
                        )
def writer():
    for i in range(100000):
        proc.stdin.write(b'%d\n' % i)
    proc.stdin.close()
thread = threading.Thread(target=writer)
thread.start()
for line in proc.stdout:
    sys.stdout.write(line.decode())
thread.join()
proc.wait()

It might be neat to see the subprocess module modernized to support streams and coroutines, which would allow pipelines that mix Python pieces and shell pieces to be constructed more elegantly.

answered Oct 24 '22 05:10

Jed

Related questions
                            
                                How to create a heat map in python that ranges from green to red?
                            
                                No URL to redirect to. Either provide a url or define a get_absolute_url method on the Model
                            
                                Scrapy .css select element with a specific attribute name and value
                            
                                OError: [Errno 26] Text file busy: '/...myvirtualenv/bin/python'
                            
                                PySpark Throwing error Method __getnewargs__([]) does not exist
                            
                                IF ELSE in robot framework with variables assignment
                            
                                pandas convert columns to percentages of the totals
                            
                                Can memmap pandas series. What about a dataframe?
                            
                                jupyter notebook bad interpreter error message
                            
                                Facing ValueError: Target is multiclass but average='binary'
                            
                                Installing dependencies of a local dependency with pipenv
                            
                                How to make VSCode auto-reload external *.py modules?
                            
                                Why are f-strings faster than str() to parse values?
                            
                                ERROR: unable to download video data: HTTP Error 403: Forbidden while using youtube_dl
                            
                                Why Python decorators rather than closures?
                            
                                IPython doesn't work in Django shell
                            
                                Where can i find good practice python problems with solutions? [closed]
                            
                                python: unhashable type error
                            
                                str to time in python
                            
                                Is there a statistical profiler for python? If not, how could I go about writing one?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

pipe large amount of data to stdin while using subprocess.Popen

Tags:

python

subprocess

popen

pietro abate

People also ask

Video Answer

1 Answers

Jed

Recent Activity

Donate For Us