Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Does head consume additional characters from stdin?

Tags:

linux

bash

shell

When I execute the following head command:

yes 123456789 | ( head -n 1; head -n 1 )

I get:

123456789
3456789

While I would expect:

123456789
123456789

It also puzzles me that when I execute:

echo -e "123456789\n123456789\n123456789\n123456789\n123456789\n" | \
( head -n 1; head -n 1 )

I get:

123456789

instead of:

123456789
123456789

I guess there is something that I do not understand. Do you know why I get this behaviour?

like image 980
marcmagransdeabril Avatar asked Mar 03 '14 13:03

marcmagransdeabril


2 Answers

Yes, head is definitely reading more than one line. It will be doing buffered I/O. Reading from a file, it appears to read by lines, but from a pipe, it's reading something like 512 bytes at a time. That would be consistent with what you see. The 3456789 is probably not the 2nd line, but the 52nd. To experiment with this, use something where you can tell the lines apart instead of yes. cat somefile | works nicely.

like image 85
Peter Westlake Avatar answered Sep 19 '22 12:09

Peter Westlake


Input and output are completely different beasts. The manual of head tell you what is the expected output, but it doesn't tell you anything about how the input is processed.

So the short answer is: you're relying on undocumented things.

Now, if you are interested to know what's going behind the scenes, you can add some tracing

| ( strace head -n 1; tail )

in your 2nd example: Note: sorry for the strace format, I'm on cygwin at the moment.:

[...]
 24   35374 [main] head 1784 read: 51 = read(0, 0x22C700, 1024)

the first head process is trying to read the input, by reading a big chunk(1024 bytes), then probably looking for a newline character in the buffer. At least, that's how I would implement it. As you can see, it processed all 51 characters, so there's nothing left for the next process.

in your 1st example: the main difference here is that we have an endless input, so even though the first head will read a big chunk, there's also input left for the second process. The boundary will be arbitrary, it depends on the chunk size, implementation of head, how fread (buffered IO) is implemented an so on. For example, on my system, this was the output:

123456789
56789
like image 38
Karoly Horvath Avatar answered Sep 19 '22 12:09

Karoly Horvath