Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

File I/O with streams - best memory buffer size

I am writing a small I/O library to assist with a larger (hobby) project. A part of this library performs various functions on a file, which is read / written via the FileStream object. On each StreamReader.Read(...) pass,

I fire off an event which will be used in the main app to display progress information. The processing that goes on in the loop is vaired, but is not too time consuming (it could just be a simple file copy, for example, or may involve encryption...).

My main question is: What is the best memory buffer size to use? Thinking about physical disk layouts, I could pick 2k, which would cover a CD sector size and is a nice multiple of a 512 bytes hard disk sector. Higher up the abstraction tree, you could go for a larger buffer which could read an entire FAT cluster at a time. I realise with today's PC's, I could go for a more memory hungry option (a couple of MiB, for example), but then I increase the time between UI updates and the user perceives a less responsive application.

As an aside, I'm eventually hoping to provide a similar interface to files hosted on FTP / HTTP servers (over a local network / fastish DSL). What would be the best memory buffer size for those (again, a "best-case" tradeoff between perceived responsiveness vs. performance)?

like image 225
AJ. Avatar asked Jun 13 '10 20:06

AJ.


3 Answers

Files are already buffered by the file system cache. You just need to pick a buffer size that doesn't force FileStream to make the native Windows ReadFile() API call to fill the buffer too often. Don't go below a kilobyte, more than 16 KB is a waste of memory and unfriendly to the CPU's L1 cache (typically 16 or 32 KB of data).

4 KB is a traditional choice, even though that will exactly span a virtual memory page only ever by accident. It is difficult to profile; you'll end up measuring how long it takes to read a cached file. Which runs at RAM speeds, 5 gigabytes/sec and up if the data is available in the cache. It will be in the cache the second time you run your test, and that won't happen in a production environment too often. File I/O is completely dominated by the disk drive or the NIC and is glacially slow, copying the data is peanuts. 4 KB will work fine.

like image 120
Hans Passant Avatar answered Sep 20 '22 20:09

Hans Passant


When I deal with files directly through a stream object, I typically use 4096 bytes. It seems to be reasonably effective across multiple I/O areas (local file system, LAN/SMB, network stream, etc.), but I haven't profiled it or anything. Way back when, I saw several examples use that size, and it stuck in my memory. That doesn't mean it's the best though.

like image 4
Nate Avatar answered Sep 18 '22 20:09

Nate


"It depends".

You would have to test your application with different buffer sizes to determine whis is best. You can't guess ahead of time.

like image 3
John Saunders Avatar answered Sep 20 '22 20:09

John Saunders