Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to safely flush a buffer from a different thread, without synchronized methods?

There are multiple threads, say B, C and D, each writing small packets of data to a buffer at a high frequency. They own their buffer and nobody else ever writes to it. Writing must be as fast as possible, and I've determined that using synchronized makes it unacceptably slow.

The buffers are simply byte arrays, along with the index of the first free element:

byte[] buffer;
int index;

public void write(byte[] data) {
    // some checking that the buffer won't overflow... not important now
    System.arraycopy(data, 0, buffer, index, data.length);
    index += data.length;
}

Every once in a while, thread A comes along to flush everybody's buffer to a file. It's okay if this part has some overhead, so using synchronized here is no problem.

Now the trouble is, that some other thread might be writing to a buffer, while thread A is flushing it. This means that two threads attempt to write to index around the same time. That would lead to data corruption, which I would like to prevent, but without using synchronized in the write() method.

I've got the feeling that, using the right order of operations and probably some volatile fields, this must be possible. Any bright ideas?

like image 479
Thomas Avatar asked Feb 12 '11 09:02

Thomas


2 Answers

Have you tried a solution which uses synchronization, and found it doesn't perform well enough? You say you've determined that it's unacceptably slow - how slow was it, and do you already have a performance budget? Normally, obtaining an uncontested lock is extremely cheap, so I wouldn't expect it to be a problem.

There may well be some clever lock-free solution - but it's likely to be significantly more complicated than just synchronizing whenever you need to access shared data. I understand that lock-free coding is all the rage, and scales beautifully when you can do it - but if you've got one thread interfering with another's data, it's very hard to do it safely. Just to be clear, I like using lock-free code when I can use high-level abstractions created by experts - things like the Parallel Extensions in .NET 4. I just don't like working with low-level abstractions like volatile variables if I can help it.

Try locking, and benchmark it. Work out what performance is acceptable, and compare the performance of a simple solution with that goal.

Of course, one option is redesigning... does the flushing have to happen actively in a different thread? Could the individual writer threads not just hand off the buffer to the flushing thread (and start a different buffer) periodically? That would make things a lot simpler.

EDIT: Regarding your "flush signal" idea - I'd been thinking along similar lines. But you need to be careful about how you do it so that the signal can't get lost even if one thread takes a long time to process whatever it's doing. I suggest you make thread A publish a "flush counter"... and each thread keeps its own counter of when it last flushed.

EDIT: Just realized this is Java, not C# - updated :)

Use AtomicLong.incrementAndGet() to increment from thread A, and AtomicLong.get() to read from the other threads. Then in each thread, compare whether you're "up to date", and flush if necessary:

private long lastFlush; // Last counter for our flush
private Flusher flusher; // The single flusher used by all threads 

public void write(...)
{
    long latestFlush = flusher.getCount(); // Will use AtomicLong.get() internally
    if (latestFlush > lastFlush)
    {
        flusher.Flush(data);
        // Do whatever else you need
        lastFlush = latestFlush; // Don't use flusher.getCount() here!
    }
    // Now do the normal write
}

Note that this assumes you only ever need to check for flushing in the Write method. Obviously that may not be the case, but hopefully you can adapt the idea.

like image 73
Jon Skeet Avatar answered Oct 29 '22 04:10

Jon Skeet


You can use volatile alone to safely read/write to a buffer (if you have only one writer) however, only one thread can safely flush the data. To do this you can use a ring buffer.

I would add to @Jon's comment that this is significantly more complicated to test. e.g. I had one "solution" which worked for 1 billion messages consistently one day but kept breaking the next because the box was more loaded.

With synchronized your latency should be below 2 micro-seconds. With Lock, you could get this down to 1 micro-second. with busy waiting on a volatile you can get this down to 3-6 ns per byte (The time it takes to transfer data between threads becomes important)

Note: as the volume of data increases the relative cost of the lock becomes less important. e.g. if you are typically writing 200 bytes or more I wouldn't worry about the difference.

One approach I take is to use the Exchanger with two direct ByteBuffers and avoid writing any data in the critical path (i.e. only write the data after I have processed everything and it doesn't matter so much)

like image 39
Peter Lawrey Avatar answered Oct 29 '22 04:10

Peter Lawrey