I have some long loop that I need to write some data to a file on every iteration. The problem is that writing to a file can be slow, so I would like to reduce the time this takes by doing the writing asynchronously.
Does anyone know a good way to do this? Should I be creating a thread that consumes whatever is put into it's buffer by writing it out ( in this case, a single producer, single consumer )?
I am interested mostly in solutions that don't involve anything but the standard library (C++11).
Before going into asynchronous writing, if you are using IOStreams you might want to try to avoid flushing the stream accidentally, e.g., by not using std::endl
but rather using '\n'
instead. Since writing to IOStreams is buffered this can improve performance quite a bit.
If that's not sufficient, the next question is how the data is written. If there is a lot of formatting going on, there is a chance that the actual formatting takes most of the time. You might be able to push the formatting off into a separate thread but that's quite different from merely passing off writing a couple of bytes to another thread: you'd need to pass on a suitable data structure holding the data to be formatted. What is suitable depends on what you are actually writing, though.
Finally, if writing the buffers to a file is really the bottleneck and you want to stick with the standard C++ library, it may be reasonable to have a writer thread which listens on a queue filled with buffers from a suitable stream buffer and writes the buffers to an std::ofstream
: the producer interface would be an std::ostream
which would send off probably fixed sized buffers either when the buffer is full or when the stream is flushed (for which I'd use std::flush
explicitly) to a queue on which the other read listens. Below is a quick implementation of that idea using only standard library facilities:
#include <condition_variable>
#include <fstream>
#include <mutex>
#include <queue>
#include <streambuf>
#include <string>
#include <thread>
#include <vector>
struct async_buf
: std::streambuf
{
std::ofstream out;
std::mutex mutex;
std::condition_variable condition;
std::queue<std::vector<char>> queue;
std::vector<char> buffer;
bool done;
std::thread thread;
void worker() {
bool local_done(false);
std::vector<char> buf;
while (!local_done) {
{
std::unique_lock<std::mutex> guard(this->mutex);
this->condition.wait(guard,
[this](){ return !this->queue.empty()
|| this->done; });
if (!this->queue.empty()) {
buf.swap(queue.front());
queue.pop();
}
local_done = this->queue.empty() && this->done;
}
if (!buf.empty()) {
out.write(buf.data(), std::streamsize(buf.size()));
buf.clear();
}
}
out.flush();
}
public:
async_buf(std::string const& name)
: out(name)
, buffer(128)
, done(false)
, thread(&async_buf::worker, this) {
this->setp(this->buffer.data(),
this->buffer.data() + this->buffer.size() - 1);
}
~async_buf() {
std::unique_lock<std::mutex>(this->mutex), (this->done = true);
this->condition.notify_one();
this->thread.join();
}
int overflow(int c) {
if (c != std::char_traits<char>::eof()) {
*this->pptr() = std::char_traits<char>::to_char_type(c);
this->pbump(1);
}
return this->sync() != -1
? std::char_traits<char>::not_eof(c): std::char_traits<char>::eof();
}
int sync() {
if (this->pbase() != this->pptr()) {
this->buffer.resize(std::size_t(this->pptr() - this->pbase()));
{
std::unique_lock<std::mutex> guard(this->mutex);
this->queue.push(std::move(this->buffer));
}
this->condition.notify_one();
this->buffer = std::vector<char>(128);
this->setp(this->buffer.data(),
this->buffer.data() + this->buffer.size() - 1);
}
return 0;
}
};
int main()
{
async_buf sbuf("async.out");
std::ostream astream(&sbuf);
std::ifstream in("async_stream.cpp");
for (std::string line; std::getline(in, line); ) {
astream << line << '\n' << std::flush;
}
}
Search the web for "double buffering."
In general, one thread will write to one or more buffers. Another thread reads from the buffers, "chasing" the writing thread.
This may not make your program more efficient. Efficiency with files is achieved by writing in huge blocks so that the drive doesn't get a chance to spin down. One write of many bytes is more efficient than many writes of a few bytes.
This could be achieved by having the writing thread only write when the buffer content has exceeded some threshold like 1k.
Also research the topic of "spooling" or "print spooling".
You'll need to use C++11 since previous versions don't have threading support in the standard library. I don't know why you limit yourself, since Boost has some good stuff in it.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With