I am trying to make a python process that reads some input, processes it and prints out the result. The processing is done by a subprocess (Stanford's NER), for ilustration I will use 'cat'. I don't know exactly how much output NER will give, so I use run a separate thread to collect it all and print it out. The following example illustrates.
import sys
import threading
import subprocess
# start my subprocess
cat = subprocess.Popen(
['cat'],
shell=False, stdout=subprocess.PIPE, stdin=subprocess.PIPE,
stderr=None)
def subproc_cat():
""" Reads the subprocess output and prints out """
while True:
line = cat.stdout.readline()
if not line:
break
print("CAT PROC: %s" % line.decode('UTF-8'))
# a daemon that runs the above function
th = threading.Thread(target=subproc_cat)
th.setDaemon(True)
th.start()
# the main thread reads from stdin and feeds the subprocess
while True:
line = sys.stdin.readline()
print("MAIN PROC: %s" % line)
if not line:
break
cat.stdin.write(bytes(line.strip() + "\n", 'UTF-8'))
cat.stdin.flush()
This seems to work well when I enter text with the keyboard. However, if I try to pipe input into my script (cat file.txt | python3 my_script.py), a racing condition seems to occur. Sometimes I get proper output, sometimes not, sometimes it locks down. Any help would be appreciated!
I am runing Ubuntu 14.04, python 3.4.0. The solution should be platform-independant.
Add th.join()
at the end otherwise you may kill the thread prematurely before it has processed all the output when the main thread exits: daemon threads do not survive the main thread (or remove th.setDaemon(True)
instead of th.join()
).
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With