Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Can I trust shell's `read` to not buffer input?

Tags:

bash

shell

posix

seq 99999 | (head -n2; cat) | head -n5 
1
2

1861
1862

In the above example, head -n2 reads much more than two lines, so cat misses out on them.

Using read does not have this problem:

seq 99999 | (for i in 1 2; do read -r s; printf %s\\n "$s"; done; cat) | head -n5
1
2
3
4
5

The question is: can this be relied upon?

I have tried reading about POSIX read and shell without finding the answer.

(What I really want is approximately: a_program | (sed 's/A/B/;7q'; cat) | program_b, where I modify first ~7 lines (text), then cat the rest (incl. binary). If read works, that's fine, but I want to know how safe it is.)

like image 451
Richard Tingstad Avatar asked Oct 11 '25 11:10

Richard Tingstad


1 Answers

Yes, you can depend on the read built-in to (appear to) not buffer input. Normal operation is for read to take input a byte at a time to ensure that it does not read past the next newline character.

Some shells (including Bash) have an optimization whereby they read blocks of data at a time but they use "seek" to position the file input position back to just after the next newline. However, "seek" cannot be used with some inputs (including input from pipes, named pipes (aka FIFOs), and sockets) so the optimization is not applicable and reading has to be done byte-by-byte, which is slow. The read commands in the code in the question are taking input from a pipe so they will read byte-by-byte.

For more technical details, see the answer by Stéphane Chazelas to Why some shells read builtin fail to read the whole line from file in /proc? on Unix & Linux Stack Exchange. (Relevant information starts at paragraph 3).

Also see head eats extra characters.

like image 59
pjh Avatar answered Oct 14 '25 15:10

pjh