how to declare and use "one writer, many readers, one process, simple type" variable?

Question

I have really simple question. I have simple type variable (like int). I have one process, one writer thread, several "readonly" threads. How should I declare variable?

volatile int
std::atomic<int>
int

I expect that when "writer" thread modifies value all "reader" threads should see fresh value ASAP.

It's ok to read and write variable at the same time, but I expect reader to obtain either old value or new value, not some "intermediate" value.

I'm using single-CPU Xeon E5 v3 machine. I do not need to be portable, I run the code only on this server, i compile with -march=native -mtune=native. Performance is very important so I do not want to add "synchronization overhead" unless absolutely required.

If I just use int and one thread writes value is it possible that in another thread I do not see "fresh" value for a while?

Mike Seymour · Accepted Answer

Just use std::atomic.

Don't use volatile, and don't use it as it is; that doesn't give the necessary synchronisation. Modifying it in one thread and accessing it from another without synchronisation will give undefined behaviour.

NathanOliver · Answer

If you have unsynchronized access to a variable where you have one or more writers then your program has undefined behavior. Some how you have to guarantee that while a write is happening no other write or read can happen. This is called synchronization. How you achieve this synchronization depends on the application.

For something like this where we have one writer and and several readers and are using a TriviallyCopyable datatype then a std::atomic<> will work. The atomic variable will make sure under the hood that only one thread can access the variable at the same time.

If you do not have a TriviallyCopyable type or you do not want to use a std::atomic You could also use a conventional std::mutex and a std::lock_guard to control access

{ // enter locking scope
    std::lock_guard lock(mutx); // create lock guard which locks the mutex
    some_variable = some_value; // do work
} // end scope lock is destroyed and mutx is released

An important thing to keep in mind with this approach is that you want to keep the // do work section as short as possible as while the mutex is locked no other thread can enter that section.

Another option would be to use a std::shared_timed_mutex(C++14) or std::shared_mutex(C++17) which will allow multiple readers to share the mutex but when you need to write you can still look the mutex and write the data.

You do not want to use volatile to control synchronization as jalf states in this answer:

For thread-safe accesses to shared data, we need a guarantee that:

the read/write actually happens (that the compiler won't just store the value in a register instead and defer updating main memory until much later)

that no reordering takes place. Assume that we use a volatile variable as a flag to indicate whether or not some data is ready to be read. In our code, we simply set the flag after preparing the data, so all looks fine. But what if the instructions are reordered so the flag is set first?

volatile does guarantee the first point. It also guarantees that no reordering occurs between different volatile reads/writes. All volatile memory accesses will occur in the order in which they're specified. That is all we need for what volatile is intended for: manipulating I/O registers or memory-mapped hardware, but it doesn't help us in multithreaded code where the volatile object is often only used to synchronize access to non-volatile data. Those accesses can still be reordered relative to the volatile ones.

As always if you measure the performance and the performance is lacking then you can try a different solution but make sure to remeasure and compare after changing.

Lastly Herb Sutter has an excellent presentation he did at C++ and Beyond 2012 called Atomic Weapons that:

This is a two-part talk that covers the C++ memory model, how locks and atomics and fences interact and map to hardware, and more. Even though we’re talking about C++, much of this is also applicable to Java and .NET which have similar memory models, but not all the features of C++ (such as relaxed atomics).

Marwan Burelle · Answer

I'll complete a little bit the previous answers.

As exposed previously, just using int or eventually volatile int is not enough for various reason (even with the memory order constraint of Intel processors.)

So, yes, you should use atomic types for that, but you need extra considerations: atomic types guarantee coherent access but if you have visibility concerns you need to specify memory barrier (memory order.)

Barriers will enforce visibility and coherency between threads, on Intel and most modern architectures, it will enforce cache synchronizations so updates are visible for every cores. The problem is that it may be expensive if you're not careful enough.

Possible memory order are:

relaxed: no special barrier, only coherent read/write are enforce;
sequential consistency: strongest possible constraint (the default);
acquire: enforce that no loads after the current one are reordered before and add the required barrier to ensure that released stores are visible;
consume: a simplified version of acquire that mostly only constraint reordering;
release: enforce that all stores before are complete before the current one and that memory writes are done and visible to loads performing an acquire barrier.

So, if you want to be sure that updates to the variable are visible to readers, you need to flag your store with a (at least) a release memory order and, on the reader side you need an acquire memory order (again, at least.) Otherwise, readers may not see the actual version of the integer (it'll see a coherent version at least, that is the old or the new one, but not an ugly mix of the two.)

Of course, the default behavior (full consistency) will also give you the correct behavior, but at the expense of a lot of synchronization. In short, each time you add a barrier it forces cache synchronization which is almost as expensive as several cache misses (and thus reads/writes in main memory.)

So, in short you should declare your int as atomic and use the following code for store and load:

// Your variable
std::atomic<int> v;

// Read
x = v.load(std::memory_order_acquire);

// Write
v.store(x, std::memory_order_release);

And just to complete, sometimes (and more often that you think) you don't really need the sequential consistency (even the partial release/acquire consistency) since visibility of updates are pretty relative. When dealing with concurrent operations, updates take place not when write is performed but when others see the change, reading the old value is probably not a problem !

I strongly recommend reading articles related to relativistic programming and RCU, here are some interesting links:

Relativistic Programming wiki: http://wiki.cs.pdx.edu/rp/
Structured Deferral: Synchronization via Procrastination: https://queue.acm.org/detail.cfm?id=2488549
Introduction to RCU Concepts: http://www.rdrop.com/~paulmck/RCU/RCU.LinuxCon.2013.10.22a.pdf

how to declare and use "one writer, many readers, one process, simple type" variable?

Tags:

c++

multithreading

low-latency

Oleg Vazhnev

3 Answers

Mike Seymour

NathanOliver

Marwan Burelle

Recent Activity

Donate For Us

how to declare and use "one writer, many readers, one process, simple type" variable?

Tags:

c++

multithreading

low-latency

Oleg Vazhnev

3 Answers

Mike Seymour

NathanOliver

Marwan Burelle

Related questions

Recent Activity

Donate For Us