Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Concurrency: Atomic and volatile in C++11 memory model

A global variable is shared across 2 concurrently running threads on 2 different cores. The threads writes to and read from the variables. For the atomic variable can one thread read a stale value? Each core might have a value of the shared variable in its cache and when one threads writes to its copy in a cache the other thread on a different core might read stale value from its own cache. Or the compiler does strong memory ordering to read the latest value from the other cache? The c++11 standard library has std::atomic support. How this is different from the volatile keyword? How volatile and atomic types will behave differently in the above scenario?

like image 262
Abhijit-K Avatar asked Jan 11 '12 12:01

Abhijit-K


People also ask

What is volatile memory in C?

Volatile is used in C programming when we need to go and read the value stored by the pointer at the address pointed by the pointer. If you need to change anything in your code that is out of compiler reach you can use this volatile keyword before the variable for which you want to change the value.

Is Atomic volatile C++?

Firstly, volatile does not imply atomic access. It is designed for things like memory mapped I/O and signal handling. volatile is completely unnecessary when used with std::atomic , and unless your platform documents otherwise, volatile has no bearing on atomic access or memory ordering between threads.

What is the difference between atomic volatile synchronized?

Main Differences: LOCKING: The synchronized keyword is used to implement a lock-based threading model, whereas volatile and atomic variables give are non-lock based, which means threads do not require a lock to access these variables.

Does STD Atomic need volatile?

So no, don't use volatile , use std::memory_order_acquire and std::memory_order_release .


2 Answers

Firstly, volatile does not imply atomic access. It is designed for things like memory mapped I/O and signal handling. volatile is completely unnecessary when used with std::atomic, and unless your platform documents otherwise, volatile has no bearing on atomic access or memory ordering between threads.

If you have a global variable which is shared between threads, such as:

std::atomic<int> ai; 

then the visibility and ordering constraints depend on the memory ordering parameter you use for operations, and the synchronization effects of locks, threads and accesses to other atomic variables.

In the absence of any additional synchronization, if one thread writes a value to ai then there is nothing that guarantees that another thread will see the value in any given time period. The standard specifies that it should be visible "in a reasonable period of time", but any given access may return a stale value.

The default memory ordering of std::memory_order_seq_cst provides a single global total order for all std::memory_order_seq_cst operations across all variables. This doesn't mean that you can't get stale values, but it does mean that the value you do get determines and is determined by where in this total order your operation lies.

If you have 2 shared variables x and y, initially zero, and have one thread write 1 to x and another write 2 to y, then a third thread that reads both may see either (0,0), (1,0), (0,2) or (1,2) since there is no ordering constraint between the operations, and thus the operations may appear in any order in the global order.

If both writes are from the same thread, which does x=1 before y=2 and the reading thread reads y before x then (0,2) is no longer a valid option, since the read of y==2 implies that the earlier write to x is visible. The other 3 pairings (0,0), (1,0) and (1,2) are still possible, depending how the 2 reads interleave with the 2 writes.

If you use other memory orderings such as std::memory_order_relaxed or std::memory_order_acquire then the constraints are relaxed even further, and the single global ordering no longer applies. Threads don't even necessarily have to agree on the ordering of two stores to separate variables if there is no additional synchronization.

The only way to guarantee you have the "latest" value is to use a read-modify-write operation such as exchange(), compare_exchange_strong() or fetch_add(). Read-modify-write operations have an additional constraint that they always operate on the "latest" value, so a sequence of ai.fetch_add(1) operations by a series of threads will return a sequence of values with no duplicates or gaps. In the absence of additional constraints, there's still no guarantee which threads will see which values though. In particular, it is important to note that the use of an RMW operation does not force changes from other threads to become visible any quicker, it just means that if the changes are not seen by the RMW then all threads must agree that they are later in the modification order of that atomic variable than the RMW operation. Stores from different threads can still be delayed by arbitrary amounts of time, depending on when the CPU actually issues the store to memory (rather than just its own store buffer), physically how far apart the CPUs executing the threads are (in the case of a multi-processor system), and the details of the cache coherency protocol.

Working with atomic operations is a complex topic. I suggest you read a lot of background material, and examine published code before writing production code with atomics. In most cases it is easier to write code that uses locks, and not noticeably less efficient.

like image 196
Anthony Williams Avatar answered Oct 22 '22 07:10

Anthony Williams


volatile and the atomic operations have a different background, and were introduced with a different intent.

volatile dates from way back, and is principally designed to prevent compiler optimizations when accessing memory mapped IO. Modern compilers tend to do no more than suppress optimizations for volatile, although on some machines, this isn't sufficient for even memory mapped IO. Except for the special case of signal handlers, and setjmp, longjmp and getjmp sequences (where the C standard, and in the case of signals, the Posix standard, gives additional guarantees), it must be considered useless on a modern machine, where without special additional instructions (fences or memory barriers), the hardware may reorder or even suppress certain accesses. Since you shouldn't be using setjmp et al. in C++, this more or less leaves signal handlers, and in a multithreaded environment, at least under Unix, there are better solutions for those as well. And possibly memory mapped IO, if you're working on kernal code and can ensure that the compiler generates whatever is needed for the platform in question. (According to the standard, volatile access is observable behavior, which the compiler must respect. But the compiler gets to define what is meant by “access”, and most seem to define it as “a load or store machine instruction was executed”. Which, on a modern processor, doesn't even mean that there is necessarily a read or write cycle on the bus, much less that it's in the order you expect.)

Given this situation, the C++ standard added atomic access, which does provide a certain number of guarantees across threads; in particular, the code generated around an atomic access will contain the necessary additional instructions to prevent the hardware from reordering the accesses, and to ensure that the accesses propagate down to the global memory shared between cores on a multicore machine. (At one point in the standardization effort, Microsoft proposed adding these semantics to volatile, and I think some of their C++ compilers do. After discussion of the issues in the committee, however, the general consensus—including the Microsoft representative—was that it was better to leave volatile with its orginal meaning, and to define the atomic types.) Or just use the system level primitives, like mutexes, which execute whatever instructions are needed in their code. (They have to. You can't implement a mutex without some guarantees concerning the order of memory accesses.)

like image 29
James Kanze Avatar answered Oct 22 '22 07:10

James Kanze