Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Bash pipe to python

I need to absorb output of a bash command via pipe in real time. E.g

for i in $(seq 1 4); do echo $i; sleep 1; done | ./script.py

Where script.py has this

for line in sys.stdin.readlines():
        print line

I'm expecting the sequence to be printed as it becomes available, but the python script is waiting for bash script to end before proceeding.

I looked at this related answer, but that didn't solve my problem. How do I go about achieving this in python?

like image 773
Sridhar Iyer Avatar asked May 05 '15 03:05

Sridhar Iyer


People also ask

Can we run bash script in Python?

We can also execute an existing a bash script using Python subprocess module.

How do you bash pipe?

In bash, a pipe is the | character with or without the & character. With the power of both characters combined we have the control operators for pipelines, | and |&. As you could imagine, stringing commands together in bash using file I/O is no pipe dream. It is quite easy if you know your pipes.

How do you use the pipe command in Python?

pipe() method in Python is used to create a pipe. A pipe is a method to pass information from one process to another process.

What does piping into bash do?

A pipe in Bash takes the standard output of one process and passes it as standard input into another process. Bash scripts support positional arguments that can be passed in at the command line.


2 Answers

The first problem is that readlines reads all the lines into a list. It can't do that until all of the lines are present, which won't be until stdin has reached EOF.

But you don't actually need a list of the lines, just some iterable of the lines. And a file, like sys.stdin, already is such an iterable. And it's a lazy one, that generates one line at a time as soon as they're available, instead of waiting to generate them all at once.

So:

for line in sys.stdin:
    print line

Whenever you find yourself reaching for readlines, ask yourself whether you really need it. The answer will always be no. (Well, except when you want to call it with an argument, or on some defective not-quite-file-like object.) See Readlines Considered Silly for more.


But meanwhile, there's a second problem. It's not that Python is buffering its stdin, or that the other process is buffering its stdout, but that the file-object iterator itself is doing internal buffering, which may (depending on your platform—but on most POSIX platforms, it usually will) prevent you from getting to the first line until EOF, or at least until a lot of lines have been read.

This is a known problem with Python 2.x, which has been fixed in 3.x,* but that doesn't help you unless you're willing to upgrade.

The solution is mentioned in the Command line and environment docs, and in the manpage on most systems, but buried in the middle of the -u flag documentation:

Note that there is internal buffering in xreadlines(), readlines() and file-object iterators ("for line in sys.stdin") which is not influenced by this option. To work around this, you will want to use "sys.stdin.readline()" inside a "while 1:" loop.

In other words:

while True:
    line = sys.stdin.readline()
    if not line:
        break
    print line

Or:

for line in iter(sys.stdin.readline, ''):
    print line

For a different problem, in this answer, Alex Martelli points out that you can always just ignore sys.stdin and re-fdopen the file descriptor. Which means that you get a wrapper around a POSIX fd instead of a C stdio handle. But that's neither necessary nor sufficient for this question, because the problem isn't with the C stdio buffering, but the way the file.__iter__ buffering interacts with it.


* Python 3.x doesn't use the C stdio library's buffering anymore; it does everything itself, in the types in the io module, which means the iterator can just share the same buffer the file object itself is using. While io is available on 2.x as well, it's not the default thing you get for open—or for the stdio file handles, which is why it doesn't help here.

like image 140
abarnert Avatar answered Sep 28 '22 01:09

abarnert


With Python 2.7.9 (and probably all Python's prior to 3.x), this does what you expect:

#!/usr/bin/python

import sys

while True:
   line=sys.stdin.readline()
   if not line:
      break
   print line   

You can also do:

#!/usr/bin/python

import sys

for line in iter(sys.stdin.readline, ''):
   print line 

On Python 3.4.3, you can do what abarnert suggests:

#!/usr/local/bin/python3

import sys

for line in sys.stdin:
    print(line)

You can also reopen sys.stdin with the io class as Python 3 uses:

#!/usr/bin/python

import sys, io

for line in io.open(sys.stdin.fileno()):
    print(line)

The 1st, 2nd, and last methods all work on Python 2.7.6 and 2.7.9 and Python 3.4.3 on OS X; the third method, only on Python 3.

like image 32
dawg Avatar answered Sep 28 '22 00:09

dawg