Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to collect output from a Python subprocess

I am trying to make a python process that reads some input, processes it and prints out the result. The processing is done by a subprocess (Stanford's NER), for ilustration I will use 'cat'. I don't know exactly how much output NER will give, so I use run a separate thread to collect it all and print it out. The following example illustrates.

import sys
import threading
import subprocess

#   start my subprocess
cat = subprocess.Popen(
    ['cat'],
    shell=False, stdout=subprocess.PIPE, stdin=subprocess.PIPE,
    stderr=None)


def subproc_cat():
    """ Reads the subprocess output and prints out """
    while True:
        line = cat.stdout.readline()
        if not line:
            break
        print("CAT PROC: %s" % line.decode('UTF-8'))

#   a daemon that runs the above function
th = threading.Thread(target=subproc_cat)
th.setDaemon(True)
th.start()

#   the main thread reads from stdin and feeds the subprocess
while True:
    line = sys.stdin.readline()
    print("MAIN PROC: %s" % line)
    if not line:
        break
    cat.stdin.write(bytes(line.strip() + "\n", 'UTF-8'))
    cat.stdin.flush()

This seems to work well when I enter text with the keyboard. However, if I try to pipe input into my script (cat file.txt | python3 my_script.py), a racing condition seems to occur. Sometimes I get proper output, sometimes not, sometimes it locks down. Any help would be appreciated!

I am runing Ubuntu 14.04, python 3.4.0. The solution should be platform-independant.

like image 344
Florijan Stamenković Avatar asked May 26 '15 07:05

Florijan Stamenković


1 Answers

Add th.join() at the end otherwise you may kill the thread prematurely before it has processed all the output when the main thread exits: daemon threads do not survive the main thread (or remove th.setDaemon(True) instead of th.join()).

like image 149
jfs Avatar answered Sep 24 '22 20:09

jfs