Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Python child script consumes all of stdin

I discovered some strange behaviour with raw_input/readline when running python scripts within bash scripts.

In brief, when passing all of stdin at once (each entry seperated by a new line) to a parent script, the bash child scripts will only take the stdin that they require, while the python child scripts will consume all of stdin, leaving nothing for the next children. I've come up with a simple example to demonstrate what I mean:

Parent script (parent.sh)

#!/bin/bash

./child.sh
./child.sh
./child.py
./child.py

Bash child script (child.sh)

#!/bin/bash

read -a INPUT
echo "sh: got input: ${INPUT}"

Python child script (child.py)

#!/usr/bin/python -B

import sys

INPUT = raw_input()
print "py: got input: {}".format(INPUT)

Expected Result

./parent.sh <<< $'aa\nbb\ncc\ndd'
>> sh: got input: aa
>> sh: got input: bb
>> py: got input: cc
>> py: got input: dd

Actual Result

./parent.sh <<< $'aa\nbb\ncc\ndd\n'
>> sh: got input: aa
>> sh: got input: bb
>> py: got input: cc
>> Traceback (most recent call last):
>>   File "./child.py", line 5, in <module>
>>     INPUT = raw_input()
>> EOFError: EOF when reading a line

raw_input seems to purge all of the remaining lines in stdin. Using sys.stdin.readline instead of raw_input does not raise an EOFError, however the input received is an empty string, rather than the expected 'dd'.

What is happening here? How can I avoid this behaviour, such that the last child script receives the expected input?

edit: Just to be sure, I added a few more lines to stdin, and the result is the same:

./parent.sh <<< $'aa\nbb\ncc\ndd\nff\nee\n'
>> sh: got input: aa
>> sh: got input: bb
>> py: got input: cc
>> Traceback (most recent call last):
>>   File "./child.py", line 5, in <module>
>>     INPUT = raw_input()
>> EOFError: EOF when reading a line
like image 895
yobiscus Avatar asked Feb 21 '26 03:02

yobiscus


2 Answers

Here's an easier way of demonstrating the same issue:

printf "%s\n" foo bar | {
    head -n 1
    head -n 1
}

By all accounts, this looks like it should print two lines, but the bar is mysteriously missing.

This happens because reading lines is a lie. The UNIX programming model has no support for it.

Instead, what basically all tools do is to consume an entire buffer, carve out the first line, and leave the rest of the buffer for the next call. This is true for head, Python raw_input(), C fgets(), Java BufferedReader.readLine() and pretty much everything else.

Since UNIX counts the entire buffer as consumed, regardless of how much the program actually ends up using, the rest of the buffer is discarded when the program exits.

bash, however, works around it: it reads byte by byte until it reaches a line feed. This is very inefficient, but it allows read to only consume a single line from the stream, leaving the rest in place for the next process.

You can do the same thing in Python by opening a raw, unbuffered reader:

import sys
import os
f = os.fdopen(sys.stdin.fileno(), 'rb', 0)
line=f.readline()[:-1]
print "Python read: ", line

We can test this the same way:

printf "%s\n" foo bar | {
    python myscript
    python myscript
}

prints

Python read: foo
Python read: bar
like image 187
that other guy Avatar answered Feb 23 '26 15:02

that other guy


The python interpreter will buffer standard input by default. You can use the -u option to disable this behaviour although it is less efficient.

parent.sh

/bin/bash

./child.sh
./child.sh
python -u child.py
python -u child.py

output

./parent.sh <<< $'aa\nbb\ncc\ndd'
sh: got input: aa
sh: got input: bb
py: got input: cc 
py: got input: dd
like image 44
edi_allen Avatar answered Feb 23 '26 15:02

edi_allen



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!