I'm writing a python script that can read input through a pipe from another command like so
batch_job | myparser
My script myparser
processes the output of batch_job
and write to its own stdout. My problem is that I want to see the output immediately (the output of batch_job is processed line-by-line) but there appears to be this notorious stdin buffering (allegedly 4KB, I haven't verified) which delays everything.
The problem has been discussed already here here and here.
I tried the following:
os.fdopen(sys.stdin.fileno(), 'r', 0)
-u
in my hashbang: #!/usr/bin/python -u
export PYTHONUNBUFFERED=1
right before calling the scriptMy python version is 2.4.3 - I have no possibility of upgrading or installing any additional programs or packages. How can I get rid of these delays?
I've encountered the same issue with legacy code. It appears to be a problem with the implementation of Python 2's file
object's __next__
method; it uses a Python level buffer (which -u
/PYTHONUNBUFFERED=1
doesn't affect, because those only unbuffer the stdio
FILE*
s themselves, but file.__next__
's buffering isn't related; similarly, stdbuf
/unbuffer
can't change any of the buffering at all, because Python replaces the default buffer made by the C runtime; the last thing file.__init__
does for a newly opened file is call PyFile_SetBufSize
which uses setvbuf
/setbuf
[the APIs] to replace the default stdio
buffer).
The problem is seen when you have a loop of the form:
for line in sys.stdin:
where the first call to __next__
(called implicitly by the for
loop to get each line
) ends up blocking to fill the block before producing a single line.
There are three possible fixes:
(Only on Python 2.6+) Rewrap sys.stdio
with the io
module (backported from Python 3 as a built-in) to bypass file
entirely in favor of the (frankly superior) Python 3 design (which uses a single system call at a time to populate the buffer without blocking for the full requested read to occur; if it asks for 4096 bytes and gets 3, it'll see if a line is available and produce it if so) so:
import io
import sys
# Add buffering=0 argument if you won't always consume stdin completely, so you
# can't lose data in the wrapper's buffer. It'll be slower with buffering=0 though.
with io.open(sys.stdin.fileno(), 'rb', closefd=False) as stdin:
for line in stdin:
# Do stuff with the line
This will typically be faster than option 2, but it's more verbose, and requires Python 2.6+. It also allows for the rewrap to be Unicode friendly, by changing the mode to 'r'
and optionally passing the known encoding
of the input (if it's not the locale default) to seamlessly get unicode
lines instead of (ASCII only) str
.
(Any version of Python) Work around problems with file.__next__
by using file.readline
instead; despite nearly identical intended behavior, readline
doesn't do its own (over)buffering, it delegates to C stdio
's fgets
(default build settings) or a manual loop calling getc
/getc_unlocked
into a buffer that stops exactly when it hits end of line. By combining it with two-arg iter
you can get nearly identical code without excess verbosity (it'll probably be slower than the prior solution, depending on whether fgets
is used under the hood, and how the C runtime implements it):
# '' is the sentinel that ends the loop; readline returns '' at EOF
for line in iter(sys.stdin.readline, ''):
# Do stuff with line
Move to Python 3, which doesn't have this problem. :-)
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With