Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Shell redirection and file I/O durations

Tags:

linux

bash

io

I am redirecting some output to a file in three different ways and each of these take clearly different amount of time.

$ >/tmp/file ; time for i in {1..1000}; do for j in {1..1000}; do echo $i $j >> /tmp/file; done; done

real    0m33.467s
user    0m21.170s
sys     0m11.919s

$ >/tmp/file ; exec 3>/tmp/file; time for i in {1..1000}; do for j in {1..1000}; do echo $i $j >&3; done; done; exec 3>&-

real    0m24.211s
user    0m17.181s
sys     0m7.002s

$ >/tmp/file ; time for i in {1..1000}; do for j in {1..1000}; do echo $i $j; done; done >> /tmp/file 

real    0m17.038s
user    0m13.072s
sys     0m3.945s

Can someone explain the differences here. My current understanding/doubts are:

  1. 1st is slowest as it opens/closes the file multiple times while others only do it only once. Is that right? What about buffering. Normally, I would expect all output to get buffered in which case we should not have such large time differences.
  2. In 3rd, if all the output is only written at the end of the outer loop, where is all the output stored while the loops are still executing. Perhaps in the memory. Does it mean I can run out of memory if I echo a lot of stuff and only write at the end.
  3. Is 2nd more like the 1st or 3rd. Why is it so different from either.

PS: I have run the above commands a couple of times and found the times to be consistent. So, the differences I see must be due to some real reasons.

like image 649
Vivek Avatar asked Nov 26 '12 18:11

Vivek


1 Answers

  1. The first version does a million times the echo $i $j >> /tmp/file, which opens the file for appending, writes to it and closes it.

  2. Doing a million times echo $i $j >&3 differs from one in that it does not open/close the file each time, but writes to file descriptor #3. The exec 3>/tmp/file opens the file for writing and saves the file descriptor as #3. When now a command has its stdout redirected to file descriptor #3 (the effect of the >&3 after the echo), the shell needs to setup this redirection before executing the command and afterwards restore the previous assignment to stdout.

  3. Redirecting the output of the complete loop like this >> /tmp/file is much easier for the shell: It can simply execute the echo command without setting up additional file descriptors. It changes the assignment of stdout just once.

About buffering: In all three cases the underlying file system will buffer the access to the physical file, so there is no difference at that level. Also most linuxes have a tmpfs mounted on /tmp that makes everything you do a pure memory operation anyway. So You are not measuring IO-performance here but shell command execution performance. You can prove this by increasing the number of bytes written (add a constant value to the line echo prints):

>/tmp/file ; time for i in {1..1000}; do for j in {1..1000}; do echo "1000000 $i $j" >> /tmp/file; done; done

>/tmp/file ; exec 3>/tmp/file; time for i in {1..1000}; do for j in {1..1000}; do echo "1000000 $i $j" >&3; done; done; exec 3>&-

>/tmp/file ; time for i in {1..1000}; do for j in {1..1000}; do echo "1000000 $i $j"; done; done >> /tmp/file

On my PC this takes just the same time as without the constant "1000000 ", but writes twice as many bytes to the file.

like image 168
holgero Avatar answered Oct 05 '22 11:10

holgero