In my program I am simulating a N-body system for a large number of iterations. For each iteration I produce a set of 6N coordinates which I need to append to a file and then use for executing the next iteration. The code is written in C++ and currently makes use of ofstream
's method write()
to write the data in binary format at each iteration.
I am not an expert in this field, but I would like to improve this part of the program, since I am in the process of optimizing the whole code. I feel that the latency associated with writing the result of the computation at each cycle significantly slows down the performance of the software.
I'm confused because I have no experience in actual parallel programming and low level file I/O. I thought of some abstract techniques that I imagined I could implement, since I am programming for modern (possibly multi-core) machines with Unix OSes:
mmap
(the file size could be huge, on the order of GBs, is this approach robust enough?) However, I don't know how to best implement them and combine them appropriately.
Of course writing into a file at each iteration is inefficient and most likely slow down your computing. (as a rule of thumb, depends on your actuel case)
You have to use a producer -> consumer design pattern. They will be linked by a queue, like a conveyor belt.
By splitting the two, you can increase performance more easily because each process is simpler and has less interferences from the other.
There is no need to optimize both. Only optimize the slowest (the bottleneck).
Practically, you use threads and a synchronized queue between them. For implementation hints, have a look here, especially §18.12 "The Producer-Consumer Pattern".
About flow management, you'll have to add a little bit more complexity by selecting a "max queue size" and making the producer(s) wait if the queue has not enough space. Beware of deadlocks then, code it carefully. (see the wikipedia link I gave about that)
Note : It's a good idea to use boost threads because threads are not very portable. (well, they are since C++0x but C++0x availability is not yet good)
It's better to split operation into two independent processes: data-producing
and file-writing
. Data-producing
would use some buffer for iteration-wise data passing, and file-writing
would use a queue to store write requests. Then, data-producing
would just post a write request and go on, while file-writing
would cope with the writing in the background.
Essentially, if the data is produced much faster than it can possibly be stored, you'll quickly end up holding most of it in the buffer. In that case your actual approach seems to be quite reasonable as is, since little can be done programmatically then to improve the situation.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With