I have a C app (VStudio 2010, win7 64bit) running on a machine with dual xeon chips, meaning 12 physical and 24 logical cores, and 192 gig of ram. EDIT: THE OS is win7 (ie, Windows 7, 64 bit).
The app has 24 threads (each thread has its own logical core) doing calculations and filling up a different part of a massive C structure. The structure, when all the threads are finished (and the threads are all perfectly balanced so they complete at the same time), is about 60 gigabytes.
(I have control over the hardware setup, so I am going to be using 6 2tb drives running RAID 0, which means the physical limits on writing will be approximately 6x the average sequential write speed, or about 2 gig/second.)
What is the most efficient way to get this to disk? Obviously, the i/o time will dwarf the compute time. From my research on this topic, it seems like write() (as opposed to fwrite()) is the way to go. But what other optimizations can I do on the software side, in terms of setting buffer sizes, etc. Would mmap be more efficient?
mmap(), or boost mmap is almost always the best approach. The OS is smarter than you, let it worry about what to cache!
You didn't say what OS, but on Linux the madvise, or equivalent boost hints can really boost performance.
It is hard to judge the best thing for your situation.
The first optimization to make is to preallocate the file. That way your file system does not need to keep extending its size. That should optimize some disk operations. However, avoid writing actual zeros to disk. Just set the length.
Then you have choices between mmap and write. This also depends on the operating system you use. On a Unix I would try both mmap and pwrite. pwrite is useful because each of your threads can write into the file at the desired file position without fighting over file offsets.
mmap could be good because instead of making copies into file cache, your threads would be writing directly into file cache. 60 GB is probably too large to mmap the entire file, so each thread will likely need its own mmap window onto the file which it can move around.
In Windows you would probably want to try using overlapped, asynchronous IO. That can only be done with Win32 API calls.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With