Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Streaming read from subprocess

I need to read output from a child process as it's produced -- perhaps not on every write, but well before the process completes. I've tried solutions from the Python3 docs and SO questions here and here, but I still get nothing until the child terminates.

The application is for monitoring training of a deep learning model. I need to grab the test output (about 250 bytes for each iteration, at roughly 1-minute intervals) and watch for statistical failures.

  • I cannot change the training engine; for instance, I cannot insert stdout.flush() in the child process code.
  • I can reasonably wait for a dozen lines of output to accumulate; I was hopeful of a buffer-fill solving my problem.

Code: variations are commented out.

Parent

cmd = ["/usr/bin/python3", "zzz.py"]
# test_proc = subprocess.Popen(
test_proc = subprocess.run(
    cmd,
    stdout=subprocess.PIPE,
    stderr=subprocess.STDOUT
    )

out_data = ""
print(time.time(), "START")
while not "QUIT" in str(out_data):
    out_data = test_proc.stdout
    # out_data, err_data = test_proc.communicate()
    print(time.time(), "MAIN received", out_data)

Child (zzz.py)

from time import sleep
import sys

for _ in range(5):
    print(_, "sleeping", "."*1000)
    # sys.stdout.flush()
    sleep(1)

print("QUIT this exercise")

Despite sending lines of 1000+ bytes, the buffer (tested elsewhere as 2kb; here, I've gone as high as 50kb) filling doesn't cause the parent to "see" the new text.

What am I missing to get this to work?


Update with regard to links, comments, and iBug's posted answer:

  • Popen instead of run fixed the blocking issue. Somehow I missed this in the documentation and my experiments with both.
  • universal_newline=True neatly changed the bytes return to string: easier to handle on the receiving end, although with interleaved empty lines (easy to detect and discard).
  • Setting bufsize to something tiny (e.g. 1) didn't affect anything; the parent still has to wait for the child to fill the stdout buffer, 8k in my case.
  • export PYTHONUNBUFFERED=1 before execution did fix the buffering problem. Thanks to wim for the link.

Unless someone comes up with a canonical, nifty solution that makes these obsolete, I'll accept iBug's answer tomorrow.

like image 777
Prune Avatar asked Dec 29 '18 01:12

Prune


1 Answers

subprocess.run always spawns the child process, and blocks the thread until it exits.

The only option for you is to use p = subprocess.Popen(...) and read lines with s = p.stdout.readline() or p.stdout.__iter__() (see below).

This code works for me, if the child process flushes stdout after printing a line (see below for extended note).

cmd = ["/usr/bin/python3", "zzz.py"]
test_proc = subprocess.Popen(
    cmd,
    stdout=subprocess.PIPE,
    stderr=subprocess.STDOUT
)

out_data = ""
print(time.time(), "START")
while not "QUIT" in str(out_data):
    out_data = test_proc.stdout.readline()
    print(time.time(), "MAIN received", out_data)
test_proc.communicate()  # shut it down

See my terminal log (dots removed from zzz.py):

ibug@ubuntu:~/t $ python3 p.py
1546450821.9174328 START
1546450821.9793346 MAIN received b'0 sleeping \n'
1546450822.987753 MAIN received b'1 sleeping \n'
1546450823.993136 MAIN received b'2 sleeping \n'
1546450824.997726 MAIN received b'3 sleeping \n'
1546450825.9975247 MAIN received b'4 sleeping \n'
1546450827.0094354 MAIN received b'QUIT this exercise\n'

You can also do it with a for loop:

for out_data in test_proc.stdout:
    if "QUIT" in str(out_data):
        break
    print(time.time(), "MAIN received", out_data)

If you cannot modify the child process, unbuffer (from package expect - install with APT or YUM) may help. This is my working parent code without changing the child code.

test_proc = subprocess.Popen(
    ["unbuffer"] + cmd,
    stdout=subprocess.PIPE,
    stderr=subprocess.STDOUT
)
like image 168
iBug Avatar answered Sep 23 '22 03:09

iBug