Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Buffered vs unbuffered IO

I learned that by default I/O in programs is buffered, i.e they are served from a temporary storage to the requesting program. I understand that buffering improves IO performance (maybe by reducing system calls). I have seen examples of disabling buffering, like setvbuf in C. What is the difference between the two modes and when should one be used over the other?

like image 479
sud03r Avatar asked Sep 20 '09 07:09

sud03r


People also ask

What is the main difference between buffered and unbuffered I O?

With buffered I/O, there is a lot of data copying happening: program structures –> FILE buffer –> kernel buffer –> disk. With unbuffered I/O, the copy to the FILE buffer is avoided, and with scatter/gather I/O, the kernel might be able to avoid a copy into its own buffers.

Is unbuffered I O more efficient than buffered I O?

Unbuffered output is generally better when you already have large buffers to send -- copying to an intermediate buffer will not reduce the number of OS calls further, and introduces additional work.

What is a buffered IO?

I/O buffering The process of temporarily storing data that is passing between a processor and a peripheral. The usual purpose is to smooth out the difference in rates at which the two devices can handle data. A Dictionary of Computing.


2 Answers

You want unbuffered output whenever you want to ensure that the output has been written before continuing. One example is standard error under a C runtime library - this is usually unbuffered by default. Since errors are (hopefully) infrequent, you want to know about them immediately. On the other hand, standard output is buffered simply because it's assumed there will be far more data going through it.

Another example is a logging library. If your log messages are held within buffers in your process, and your process dumps core, there a very good chance that output will never be written.

In addition, it's not just system calls that are minimized but disk I/O as well. Let's say a program reads a file one byte at a time. With unbuffered input, you will go out to the (relatively very slow) disk for every byte even though it probably has to read in a whole block anyway (the disk hardware itself may have buffers but you're still going out to the disk controller which is going to be slower than in-memory access).

By buffering, the whole block is read in to the buffer at once then the individual bytes are delivered to you from the (in-memory, incredibly fast) buffer area.

Keep in mind that buffering can take many forms, such as in the following example:

+-------------------+-------------------+ | Process A         | Process B         | +-------------------+-------------------+ | C runtime library | C runtime library | C RTL buffers +-------------------+-------------------+ |               OS caches               | Operating system buffers +---------------------------------------+ |      Disk controller hardware cache   | Disk hardware buffers +---------------------------------------+ |                   Disk                | +---------------------------------------+ 
like image 156
paxdiablo Avatar answered Sep 28 '22 01:09

paxdiablo


You want unbuffered output when you already have large sequence of bytes ready to write to disk, and want to avoid an extra copy into a second buffer in the middle.

Buffered output streams will accumulate write results into an intermediate buffer, sending it to the OS file system only when enough data has accumulated (or flush() is requested). This reduces the number of file system calls. Since file system calls can be expensive on most platforms (compared to short memcpy), buffered output is a net win when performing a large number of small writes. Unbuffered output is generally better when you already have large buffers to send -- copying to an intermediate buffer will not reduce the number of OS calls further, and introduces additional work.

Unbuffered output has nothing to do with ensuring your data reaches the disk; that functionality is provided by flush(), and works on both buffered and unbuffered streams. Unbuffered IO writes don't guarantee the data has reached the physical disk -- the OS file system is free to hold on to a copy of your data indefinitely, never writing it to disk, if it wants. It is only required to commit it to disk when you invoke flush(). (Note that close() will call flush() on your behalf).

like image 27
4 revs, 3 users 73% Avatar answered Sep 28 '22 01:09

4 revs, 3 users 73%