Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Python subprocess: Giving stdin, reading stdout, then giving more stdin

Tags:

I'm working with a piece of scientific software called Chimera. For some of the code downstream of this question, it requires that I use Python 2.7.

I want to call a process, give that process some input, read its output, give it more input based on that, etc.

I've used Popen to open the process, process.stdin.write to pass standard input, but then I've gotten stuck trying to get output while the process is still running. process.communicate() stops the process, process.stdout.readline() seems to keep me in an infinite loop.


Here's a simplified example of what I'd like to do:

Let's say I have a bash script called exampleInput.sh.

#!/bin/bash
# exampleInput.sh

# Read a number from the input
read -p 'Enter a number: ' num

# Multiply the number by 5
ans1=$( expr $num \* 5 )

# Give the user the multiplied number
echo $ans1

# Ask the user whether they want to keep going
read -p 'Based on the previous output, would you like to continue? ' doContinue

if [ $doContinue == "yes" ]
then
    echo "Okay, moving on..."
    # [...] more code here [...]
else
    exit 0
fi

Interacting with this through the command line, I'd run the script, type in "5" and then, if it returned "25", I'd type "yes" and, if not, I would type "no".

I want to run a python script where I pass exampleInput.sh "5" and, if it gives me "25" back, then I pass "yes"

So far, this is as close as I can get:

#!/home/user/miniconda3/bin/python2
# talk_with_example_input.py
import subprocess
process = subprocess.Popen(["./exampleInput.sh"], 
                        stdin = subprocess.PIPE,
                        stdout = subprocess.PIPE)
process.stdin.write("5")

answer = process.communicate()[0]

if answer == "25":
    process.stdin.write("yes")
    ## I'd like to print the STDOUT here, but the process is already terminated

But that fails of course, because after `process.communicate()', my process isn't running anymore.


(Just in case/FYI): Actual problem

Chimera is usually a gui-based application to examine protein structure. If you run chimera --nogui, it'll open up a prompt and take input.

I often need to know what chimera outputs before I run my next command. For example, I will often try to generate a protein surface and, if Chimera can't generate a surface, it doesn't break--it just says so through STDOUT. So, in my python script, while I'm looping through many proteins to analyze, I need to check STDOUT to know whether to continue analysis on that protein.

In other use cases, I'll run lots of commands through Chimera to clean up a protein first, and then I'll want to run lots of separate commands to get different pieces of data, and use that data to decide whether to run other commands. I could get the data, close the subprocess, and then run another process, but that would require re-running all of those cleaning up commands each time.

Anyways, those are some of the real-world reasons why I want to be able to push STDIN to a subprocess, read the STDOUT, and still be able to push more STDIN.

Thanks for your time!

like image 500
julianstanley Avatar asked Jul 26 '19 18:07

julianstanley


1 Answers

you don't need to use process.communicate in your example.

Simply read and write using process.stdin.write and process.stdout.read. Also make sure to send a newline, otherwise read won't return. And when you read from stdin, you also have to handle newlines coming from echo.

Note: process.stdout.read will block until EOF.

# talk_with_example_input.py
import subprocess

process = subprocess.Popen(["./exampleInput.sh"], 
                        stdin = subprocess.PIPE,
                        stdout = subprocess.PIPE)

process.stdin.write("5\n")
stdout = process.stdout.readline()
print(stdout)

if stdout == "25\n":
    process.stdin.write("yes\n")
    print(process.stdout.readline())
$ python2 test.py
25

Okay, moving on...



Update

When communicating with an program in that way, you have to pay special attention to what the application is actually writing. Best is to analyze the output in a hex editor:

$ chimera --nogui 2>&1 | hexdump -C

Please note that readline [1] only reads to the next newline (\n). In your case you have to call readline at least four times to get that first block of output.

If you just want to read everything up until the subprocess stops printing, you have to read byte by byte and implement a timeout. Sadly, neither read nor readline does provide such a timeout mechanism. This is probably because the underlying read syscall [2] (Linux) does not provide one either.

On Linux we can write a single-threaded read_with_timeout() using poll / select. For an example see [3].

from select import epoll, EPOLLIN

def read_with_timeout(fd, timeout__s):
    """Reads from fd until there is no new data for at least timeout__s seconds.

    This only works on linux > 2.5.44.
    """
    buf = []
    e = epoll()
    e.register(fd, EPOLLIN)
    while True:
        ret = e.poll(timeout__s)
        if not ret or ret[0][1] is not EPOLLIN:
            break
        buf.append(
            fd.read(1)
        )
    return ''.join(buf)

In case you need a reliable way to read non blocking under Windows and Linux, this answer might be helpful.


[1] from the python 2 docs:

readline(limit=-1)

Read and return one line from the stream. If limit is specified, at most limit bytes will be read.

The line terminator is always b'\n' for binary files; for text files, the newline argument to open() can be used to select the line terminator(s) recognized.

[2] from man 2 read:

#include <unistd.h>

ssize_t read(int fd, void *buf, size_t count);

[3] example

$ tree
.
├── prog.py
└── prog.sh

prog.sh

#!/usr/bin/env bash

for i in $(seq 3); do
  echo "${RANDOM}"
  sleep 1
done

sleep 3
echo "${RANDOM}"

prog.py

# talk_with_example_input.py
import subprocess
from select import epoll, EPOLLIN

def read_with_timeout(fd, timeout__s):
    """Reads from f until there is no new data for at least timeout__s seconds.

    This only works on linux > 2.5.44.
    """
    buf = []
    e = epoll()
    e.register(fd, EPOLLIN)
    while True:
        ret = e.poll(timeout__s)
        if not ret or ret[0][1] is not EPOLLIN:
            break
        buf.append(
            fd.read(1)
        )
    return ''.join(buf)

process = subprocess.Popen(
    ["./prog.sh"],
    stdin = subprocess.PIPE,
    stdout = subprocess.PIPE
)

print(read_with_timeout(process.stdout, 1.5))
print('-----')
print(read_with_timeout(process.stdout, 3))
$ python2 prog.py 
6194
14508
11293

-----
10506


like image 92
Ente Avatar answered Oct 04 '22 02:10

Ente