Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How do I pipe output of one python script to another python script

Tags:

python

I got stuck in piping the output of one script into another script (both are python).

This question is very similar but (1) it does not provide an answer (2) there is a slight difference in mine. So, I thought opening a new question would be better.

Here is the problem.
Both scripts are almost identical:

receiver.py

import sys
import time

for line in sys.stdin:
    sys.stdout.write(line)
    sys.stdout.flush()
    time.sleep(3)

replicator.py

import sys
import time

for line in sys.stdin:
    sys.stderr.write(line)
    sys.stderr.flush()
    time.sleep(3)

When I am executing these scripts in bash or cmd one by one, everything is fine. Both examples below are working and I see the input text in the output:

Works: (One line of output appears each 3 seconds)

cat data.txt | python receiver.py
cat data.txt | python replicator.py

But once I pipe from one script to another script they stop working:

Doesn't work: (Nothing appears until the end of file is being reached)

cat data.txt | python receiver.py | python replicator.py

Then when I pipe the first script to another tool it works again!

Works:

cat data.txt | python receiver.py | cat -n
cat data.txt | python replicator.py | cat -n

And finally when I remove the blocking sleep() function it starts to work again:

Removing the timer:

time.sleep(0)

Now it works:

cat data.txt | python receiver.py | python replicator.py

Does anybody know what is wrong with my piping? I am not looking for alternative ways to do it. I just want to learn what is happening here.

UPDATE

Based on the comments, I refined the examples.
Now both scripts not only print out the content of data.txt, but also add a time-stamp to each line.

receiver.py

import sys
import time
import datetime

for line in sys.stdin:
    sys.stdout.write(str(datetime.datetime.now().strftime("%H:%M:%S"))+'\t')
    sys.stdout.write(line)
    sys.stdout.flush()
    time.sleep(1)

data.txt

Line-A
Line-B
Line-C
Line-D

The result

$> cat data.txt
Line-A
Line-B
Line-C
Line-D

$> cat data.txt | python receiver.py
09:05:44        Line-A
09:05:45        Line-B
09:05:46        Line-C
09:05:47        Line-D

$> cat data.txt | python receiver.py | python receiver.py
09:05:54        09:05:50        Line-A
09:05:55        09:05:51        Line-B
09:05:56        09:05:52        Line-C
09:05:57        09:05:53        Line-D

$> cat test.log | python receiver.py | sed -e "s/^/$(date +"%H:%M:%S") /"
09:17:55        09:17:55        Line-A
09:17:55        09:17:56        Line-B
09:17:55        09:17:57        Line-C
09:17:55        09:17:58        Line-D

$> cat test.log | python receiver.py | cat | python receiver.py
09:36:21        09:36:17        Line-A
09:36:22        09:36:18        Line-B
09:36:23        09:36:19        Line-C
09:36:24        09:36:20        Line-D

As you see when I am piping the output of python script to itself, the second script waits until the first one is finished. Then it starts to digest the data.

However, when I am using another tool (sed in this example), the tool receives the data immediately. Why it is happening?

like image 625
Dark Avatar asked Nov 07 '22 19:11

Dark


1 Answers

This is due to the internal buffering in File Objects (for line in sys.stdin).

So, if we fetch line by line:

import sys
import time
import datetime

while True:
    line = sys.stdin.readline()
    if not line:
       break
    sys.stdout.write(str(datetime.datetime.now().strftime("%H:%M:%S"))+'\t')
    sys.stdout.write(line)
    sys.stdout.flush()
    time.sleep(1)

The code will work as expected:

$ cat data.txt | python receiver.py |  python receiver.py
09:43:46        09:43:46        Line-A
09:43:47        09:43:47        Line-B
09:43:48        09:43:48        Line-C
09:43:49        09:43:49        Line-D

Documentation

... Note that there is internal buffering in file.readlines() and File Objects (for line in sys.stdin) which is not influenced by this option. To work around this, you will want to use file.readline() inside a while 1: loop.

NOTE: The File Object thing was fixed in Python 3

like image 83
Juan Diego Godoy Robles Avatar answered Nov 15 '22 06:11

Juan Diego Godoy Robles