Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Why use buffers to read/write Streams

Following reading various questions on reading and writing Streams, all the various answers define something like this as the correct way to do it:

private void CopyStream(Stream input, Stream output)
{
   byte[] buffer = new byte[16 * 1024];
   int read;
   while ((read = input.Read(buffer, 0, buffer.Length)) > 0)
   {
      output.Write(buffer, 0, read);
   } 
}

Two questions:

Why read and write in these smaller chunks?

What is the significance of the buffer size used?

like image 576
James Hay Avatar asked May 12 '10 11:05

James Hay


People also ask

What is the use of having buffers and streams when would you use it?

Buffers in Streams In Node Js, buffers are used to store raw binary data. A buffer represents a chunk of memory that is allocated on our computer. The size of the buffer, once set, cannot be changed. A buffer is used to store bytes.

Why is buffering needed?

Need of Buffering :It helps in matching speed between two devices, between which the data is transmitted. For example, a hard disk has to store the file received from the modem. It helps the devices with different data transfer size to get adapted to each other.

What is buffer in read and write?

A write buffer, which is exclusively stored in the CPU cache, stores information for writing. The difference between a read and write request is how the information is handled. In a read request, the information is recovered as is, without any changes or computations.

What is the difference between streams and buffers?

Buffering is the practice of pre-loading segments of data when streaming video content. Streaming — the continuous transmission of audio or video files from a server to a client — is the process that makes watching videos online possible.


1 Answers

If you read a byte at a time, then every byte you call has the overhead of calling the function to read the byte, and additional overheads (for example, doing a fileposition += 1 to remember where in the file you are, checking if you have reached the end of the file, and so on)

If you read 4000 bytes, then you have the same overheads (in the above example, 1 function call, one add (fileposition += 4000), and one check to see if you are at the end of the file. So in terms of the overheads, you've just made it 4000 times faster. (In reality, there are other costs so you won't see that big a gain, but you have drastically cut the overheads)

Of course, you could create a buffer as big as the entire file, and get the absolute minimum overheads. However:

  • the file might be huge - bigger than the memory available to your program, so this would simply fail. Or it might be so big that you start to use virtual memory and this will drastically slow things down. So breaking it into smaller chunks means you can copy an unlimited amount of data by using a small fixed-size buffer

  • your OS and devices might be able to read and write data simultaneously (e.g. copying from one physical disk drive to another). If you read all the data before you write all the data, then you have to wait for the whole read before you can start writing. But in many cases, you may be able to be doing both operations in parallel - so read a small chunk and start it writing "asynchronously" (in the background) while you go back and read the next chunk.

  • You get diminishing returns. Reading 4 bytes instead of 1 may well be 4x faster. But reading 4,000, 40,000 or 400,000 will not speed things up (indeed, for the reasons above, larger buffers could actually slow things down).

  • In some cases, physical devices work with specific data sizes (e.g. 4096 bytes per sector, 128 bytes per cache line, or 1500 bytes per data packet, or 8 bytes (64 bits) over a CPU bus). Dividing data up into chunks that match (or are multiples of) the underlying transport/storage mechanism can help the hardware to process the data more efficiently.

Typically I/O buffers of between 4kB to 128kB work best for most situations, and you can tune these to the particular operation being performed, so there is no "perfect" size that fits all situations.

Note that in most I/O situations, there are many buffers being used. e.g. When copying data from a disk, (in simplistic terms) it is read from the disk to a read cache (buffer) in the hard drive, then sent over the interface cable to the computer's drive controller, which may also buffer the data. Then it may be transferred into RAM via an I/O buffer, where it is held until your program is ready to receive it (it will probably even be fetching the data before you ask for it, as it expects you to continue reading from the same file, and tries to buffer the data so you don't have to wait for it). Then you read it into your buffer and write it. Then it goes to another I/O buffer, is sent to the drive controller, passed on to the drive, and cached in a write cache. Eventually the hard drive will decide to actually store the data in its write cache, and your copy will be completed - most of this happens in the background, so it may not finish being written until many seconds after your program thinks it has finished writing and has gone on to another task. (This is why you have to "safely remove" USB drives before unplugging them - the OS may not have actually written all the data to the device yet, even many seconds after the computer said your copy operation was finished)

like image 85
Jason Williams Avatar answered Sep 23 '22 18:09

Jason Williams