Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Optimum file buffer read size?

I am writing an application which needs to read fairly large files. I have always wondered what's the optimum size for the read buffer on a modern Windows XP computer. I googled and found many examples which had 1024 as the optimum size.

Here is a snippet of what I mean:

long pointer = 0;
buffer = new byte[1024]; // What's a good size here ?
while (pointer < input.Length)
{
    pointer += input.Read(buffer, 0, buffer.Length);
}

My application is fairly simple, so I am not looking to write any benchmarking code, but would like to know what sizes are common?

like image 791
Andrew Keith Avatar asked Oct 11 '09 23:10

Andrew Keith


2 Answers

A 1k buffer size seems a bit small. Generally, there is no "one size fits all" buffer size. You need to set a buffer size that fits the behavior of your algorithm. Now, generally, its not a good idea to have a really huge buffer, but, having one that is too small or not in line with how you process each chunk is not that great either.

If you are simply reading data one chunk after another entirely into memory before processing it, I would use a larger buffer. I would probably use 8k or 16k, but probably not larger.

On the other hand, if you are processing the data in streaming fashion, reading a chunk then processing it before reading the next, smaller buffers might be more useful. Even better, if you are streaming data that has structure, I would change the amount of data read to specifically match the type of data you are reading. For example, if you are reading binary data that contains a 4-character code, a float, and a string, I would read the 4-character code into a 4-byte array, as well as the float. I would read the length of the string, then create a buffer to read the whole chunk of string data at once.

If you are doing streaming data processing, I would look into the BinaryReader and BinaryWriter classes. These allow you to work with binary data very easily, without having to worry much about the data itself. It also allows you to decouple your buffer sized from the actual data you are working with. You could set a 16k buffer on the underlying stream, and read individual data values with the BinaryReader with ease.

like image 185
jrista Avatar answered Oct 08 '22 12:10

jrista


Depends on where you draw the line between access time and memory usage. The larger the buffer, the faster - but the more expensive in terms of memory. reading in multiples of your File system cluster size is probably the most efficient, in a Windows XP system using NTFS, 4K is the default cluster size.

You can see this link Default cluster size for NTFS, FAT, and exFAT

Bye.

like image 28
RRUZ Avatar answered Oct 08 '22 11:10

RRUZ