I am redirecting some output to a file in three different ways and each of these take clearly different amount of time.
$ >/tmp/file ; time for i in {1..1000}; do for j in {1..1000}; do echo $i $j >> /tmp/file; done; done
real 0m33.467s
user 0m21.170s
sys 0m11.919s
$ >/tmp/file ; exec 3>/tmp/file; time for i in {1..1000}; do for j in {1..1000}; do echo $i $j >&3; done; done; exec 3>&-
real 0m24.211s
user 0m17.181s
sys 0m7.002s
$ >/tmp/file ; time for i in {1..1000}; do for j in {1..1000}; do echo $i $j; done; done >> /tmp/file
real 0m17.038s
user 0m13.072s
sys 0m3.945s
Can someone explain the differences here. My current understanding/doubts are:
PS: I have run the above commands a couple of times and found the times to be consistent. So, the differences I see must be due to some real reasons.
The first version does a million times the echo $i $j >> /tmp/file
, which opens the file for appending, writes to it and closes it.
Doing a million times echo $i $j >&3
differs from one in that it does not open/close the file each time, but writes to file descriptor #3. The exec 3>/tmp/file
opens the file for writing and saves the file descriptor as #3. When now a command has its stdout redirected to file descriptor #3 (the effect of the >&3
after the echo), the shell needs to setup this redirection before executing the command and afterwards restore the previous assignment to stdout.
Redirecting the output of the complete loop like this >> /tmp/file
is much easier for the shell: It can simply execute the echo command without setting up additional file descriptors. It changes the assignment of stdout just once.
About buffering: In all three cases the underlying file system will buffer the access to the physical file, so there is no difference at that level. Also most linuxes have a tmpfs mounted on /tmp that makes everything you do a pure memory operation anyway. So You are not measuring IO-performance here but shell command execution performance. You can prove this by increasing the number of bytes written (add a constant value to the line echo prints):
>/tmp/file ; time for i in {1..1000}; do for j in {1..1000}; do echo "1000000 $i $j" >> /tmp/file; done; done
>/tmp/file ; exec 3>/tmp/file; time for i in {1..1000}; do for j in {1..1000}; do echo "1000000 $i $j" >&3; done; done; exec 3>&-
>/tmp/file ; time for i in {1..1000}; do for j in {1..1000}; do echo "1000000 $i $j"; done; done >> /tmp/file
On my PC this takes just the same time as without the constant "1000000 ", but writes twice as many bytes to the file.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With