Why does memory_order_relaxed use atomic (lock-prefixed) instructions on x86?

Question

On Visual C++ 2013, when I compile the following code

#include <atomic>

int main()
{
    std::atomic<int> v(2);
    return v.fetch_add(1, std::memory_order_relaxed);
}

I get back the following assembly on x86:

51               push        ecx  
B8 02 00 00 00   mov         eax,2 
8D 0C 24         lea         ecx,[esp] 
87 01            xchg        eax,dword ptr [ecx] 
B8 01 00 00 00   mov         eax,1 
F0 0F C1 01      lock xadd   dword ptr [ecx],eax 
59               pop         ecx  
C3               ret

and similarly on x64:

B8 02 00 00 00    mov         eax,2 
87 44 24 08       xchg        eax,dword ptr [rsp+8] 
B8 01 00 00 00    mov         eax,1 
F0 0F C1 44 24 08 lock xadd   dword ptr [rsp+8],eax 
C3                ret

I simply don't understand: why does a relaxed increment of an int variable require a lock prefix?

Is there a reason for this, or did they simply not include the optimization of removing it?

^{* I used /O2 with /NoDefaultLib to trim it down and get rid of unnecessary C runtime code, but that's irrelevant to the question.}

VoidStar · Accepted Answer

Because a lock is still required for it to be atomic; even with memory_order_relaxed the requirement for increment/decrement is too strict to be lockless.

Imagine the same thing with no locks.

v = 0;

And then we spawn 100 threads, each with this command:

v++;

And then you wait for all threads to finish, what would you expect v to be? Unfortunately, it may not be 100. Say the value v=23 is loaded by one thread, and before 24 is created, another thread also loads 23 and then writes out 24 too. So the threads actually negate each other. This is because the increment itself is not atomic. Sure, load, store, add may be atomic on their own, but incrementing is multiple steps so it is not atomic.

But with std::atomic, all operations are atomic, regardless of the std::memory_order setting. The only question is what order they will happen in. memory_order_relaxed still guarantees atomicity, it just might be out of order with respect to anything else happening near it, even operating on the same value.

pqnet · Answer

Atomic operations, even with relaxed ordering, still have to be atomic.

Even if some operations on current CPUs were atomic without a lock prefix (hint: they are not, due to multi core caches), that wouldn't be guaranteed for future CPUs.

It would be shortsighted to have all your binaries fail horribly on the newest architecture just because you wanted to optimize a byte out of your binary relying on a feature that is not part of the assembly specification (and thus not guaranteed to be preserved in future x86_64 architectures)

Of course in this case multi-core systems are widespread so you do in practice need a lock prefix for it to work on current CPUs. See Can num++ be atomic for 'int num'?

Why does memory_order_relaxed use atomic (lock-prefixed) instructions on x86?

Tags:

c++

x86

atomic

visual-c++

relaxed-atomics

user541686

2 Answers

VoidStar

pqnet

Recent Activity

Donate For Us

Why does memory_order_relaxed use atomic (lock-prefixed) instructions on x86?

Tags:

c++

x86

atomic

visual-c++

relaxed-atomics

user541686

2 Answers

VoidStar

pqnet

Related questions

Recent Activity

Donate For Us