Optimal Buffer size for read-process-write

Question

In my function, I need to read some data from a file into a buffer, manipulate the data and write it back to another file. The file is of unknown size and may be very large.

If I use a small buffer, there will be a long read/write cycle and it would take much time. In contrast, long buffer means I need to consume more memory. What is the optimal buffer size I should use? Is this case dependent?

I saw some application like 'Tera copy' in windows that manages huge files efficiently. Is there any other technique or mechanism I should be aware of?

Note: This program will be running under Windows.

wilx · Accepted Answer

See what Microsoft has to say about IO size: http://technet.microsoft.com/en-us/library/cc938632.aspx. Basically, they say you should probably do IO in 64K blocks.

On *NIX platforms, struct stat has a st_blksize member which says what should be a minimal IO block size.

Dolda2000 · Answer

It is, indeed, highly case dependent, and you should probably just write your program to be able to handle a flexible buffer size, and then try out what size is optimal.

If you start small and then increase your buffer size, you will probably reach a certain size after which you'll see no or extremely small performance gains, since the CPU is spending most of its time running your code, and the overhead from the I/O has become negligible.

Jens Gustedt · Answer

First rule for these things is to benchmark. My guess would be that you prematurely optimizing. If you are doing real file IO, the bandwidth of your disk (or whatever) will usually be the bottleneck. As long as you write your data in chunks of several pages the performance shouldn't change too much.

What you could hope to is to do your computation of parts of the data in parallel to your write operation. For this you would have to keep two buffers, one which is currently written, and one on which you do the processing. Then you would use asynchronous IO funcions (aio_write on POSIX systems, probably something like that exists for Windows, too) and switch buffers for each iteration.

Kinjal Patel · Answer

Memory management is always case dependent and particularly when combined with file I/O.

There are two possible suggestions from my side.

1) Use fixed I/O buffer size, e.g. 64K, 256K, 512KB or 1MB. But in this case when there is I/O more than this fixed buffer size, you have to consider offsets to complete I/O in multiple iterations.

2) Use variable I/O buffer size using malloc(), but this also depends on certain factors. Such as available RAM in your system and maximum dynamic memory allocation limit for process in your OS.

Optimal Buffer size for read-process-write

Tags:

c++

performance

c

memory-management

optimization

Dipto

4 Answers

wilx

Dolda2000

Jens Gustedt

Kinjal Patel

Recent Activity

Donate For Us

Optimal Buffer size for read-process-write

Tags:

c++

performance

c

memory-management

optimization

Dipto

4 Answers

wilx

Dolda2000

Jens Gustedt

Kinjal Patel

Related questions

Recent Activity

Donate For Us