Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

C++11 atomic x86 memory ordering

Tags:

c++

c++11

atomic

In one of the docs for atomic variables in C++0x, when describing memory order, it mentions:

Release-Acquire Ordering

On strongly-ordered systems (x86, SPARC, IBM mainframe), release-acquire ordering is automatic. No additional CPU instructions are issued for this synchronization mode, only certain compiler optimizations are affected...

First is it true, that x86 follows strict memory ordering? Seems very inefficient to always impose this. Means every write and read has a fence?

Also, if I have an aligned int, on an x86 system, do the atomic variables serve any purpose at all?

like image 989
excalibur Avatar asked Aug 06 '12 21:08

excalibur


2 Answers

Yes, it's true that x86 has strict memory ordering, see Volume 3A, Chapter 8.2 of the Intel manuals. Older x86 processors such as the 386 provided truly strict ordering (called strong ordering) semantics, while more modern x86 processors have slightly relaxed conditions in a few cases, but nothing you need to worry about. For example, the Pentium and 486 allow read cache misses to go ahead of buffered writes when the writes are cache hits (and are therefore to different addresses from the reads).

Yes, it can be inefficient. Sometimes high-performance software is written only for other architectures with looser memory ordering requirements because of this.

Yes, atomic variables still serve a purpose on x86. They have special semantics with the compiler such that a typical read-modify-write operation happens atomically. If you have two threads incrementing an atomic variable (by which I mean a variable of type std::atomic<T> in C++11) simultaneously, you can be assured that the value will be 2 larger; without std::atomic, you might end up with the wrong value because one thread cached the current value in a register while performing the increment, even though the store to memory is atomic on x86.

like image 168
Adam Rosenfield Avatar answered Sep 22 '22 02:09

Adam Rosenfield


It is true that on x86 all stores have release and all loads have acquire semantics.

That doesn't and shouldn't affect the way you write C++: To write concurrent, race-free code you have to use either std::atomic constructions or locks.

What the architectural details mean is that on x86 there will be very little or no extra code generated for operations on atomic word-sized types as long as you ask for at most acquire/release ordering. (Sequential consistency will emit mfence instructions, though.) However, you still must use the C++ atomic types and cannot just omit them in order to have a correct, well-formed program. One important feature of atomic variables is that they prevent compiler reodering, which is essential to the correctness of your program.

(Pre-C++11, you would have had to use compiler-provided extensions such as GCC's __sync_* suite of functions, which would make the compiler behave correctly. If you really wanted to use naked variables, you would at least have to insert compiler barriers yourself.)

like image 37
Kerrek SB Avatar answered Sep 19 '22 02:09

Kerrek SB