I got stuck in piping the output of one script into another script (both are python).
This question is very similar but (1) it does not provide an answer (2) there is a slight difference in mine. So, I thought opening a new question would be better.
Here is the problem.
Both scripts are almost identical:
receiver.py
import sys
import time
for line in sys.stdin:
sys.stdout.write(line)
sys.stdout.flush()
time.sleep(3)
replicator.py
import sys
import time
for line in sys.stdin:
sys.stderr.write(line)
sys.stderr.flush()
time.sleep(3)
When I am executing these scripts in bash or cmd one by one, everything is fine. Both examples below are working and I see the input text in the output:
Works: (One line of output appears each 3 seconds)
cat data.txt | python receiver.py
cat data.txt | python replicator.py
But once I pipe from one script to another script they stop working:
Doesn't work: (Nothing appears until the end of file is being reached)
cat data.txt | python receiver.py | python replicator.py
Then when I pipe the first script to another tool it works again!
Works:
cat data.txt | python receiver.py | cat -n
cat data.txt | python replicator.py | cat -n
And finally when I remove the blocking sleep() function it starts to work again:
Removing the timer:
time.sleep(0)
Now it works:
cat data.txt | python receiver.py | python replicator.py
Does anybody know what is wrong with my piping? I am not looking for alternative ways to do it. I just want to learn what is happening here.
Based on the comments, I refined the examples.
Now both scripts not only print out the content of data.txt
, but also add a time-stamp to each line.
receiver.py
import sys
import time
import datetime
for line in sys.stdin:
sys.stdout.write(str(datetime.datetime.now().strftime("%H:%M:%S"))+'\t')
sys.stdout.write(line)
sys.stdout.flush()
time.sleep(1)
data.txt
Line-A
Line-B
Line-C
Line-D
The result
$> cat data.txt
Line-A
Line-B
Line-C
Line-D
$> cat data.txt | python receiver.py
09:05:44 Line-A
09:05:45 Line-B
09:05:46 Line-C
09:05:47 Line-D
$> cat data.txt | python receiver.py | python receiver.py
09:05:54 09:05:50 Line-A
09:05:55 09:05:51 Line-B
09:05:56 09:05:52 Line-C
09:05:57 09:05:53 Line-D
$> cat test.log | python receiver.py | sed -e "s/^/$(date +"%H:%M:%S") /"
09:17:55 09:17:55 Line-A
09:17:55 09:17:56 Line-B
09:17:55 09:17:57 Line-C
09:17:55 09:17:58 Line-D
$> cat test.log | python receiver.py | cat | python receiver.py
09:36:21 09:36:17 Line-A
09:36:22 09:36:18 Line-B
09:36:23 09:36:19 Line-C
09:36:24 09:36:20 Line-D
As you see when I am piping the output of python script to itself, the second script waits until the first one is finished. Then it starts to digest the data.
However, when I am using another tool (sed
in this example), the tool receives the data immediately. Why it is happening?
This is due to the internal buffering in File Objects (for line in sys.stdin
).
So, if we fetch line by line:
import sys
import time
import datetime
while True:
line = sys.stdin.readline()
if not line:
break
sys.stdout.write(str(datetime.datetime.now().strftime("%H:%M:%S"))+'\t')
sys.stdout.write(line)
sys.stdout.flush()
time.sleep(1)
The code will work as expected:
$ cat data.txt | python receiver.py | python receiver.py
09:43:46 09:43:46 Line-A
09:43:47 09:43:47 Line-B
09:43:48 09:43:48 Line-C
09:43:49 09:43:49 Line-D
Documentation
... Note that there is internal buffering in file.readlines() and File Objects (for line in sys.stdin) which is not influenced by this option. To work around this, you will want to use file.readline() inside a while 1: loop.
NOTE: The File Object
thing was fixed in Python 3
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With