Given the following test program:
#include <atomic>
#include <iostream>
int64_t process_one() {
int64_t a;
//Should be atomic on my haswell
int64_t assign = 42;
a = assign;
return a;
}
int64_t process_two() {
std::atomic<int64_t> a;
int64_t assign = 42;
a = assign;
return a;
}
int main() {
auto res_one = process_one();
auto res_two = process_two();
std::cout << res_one << std::endl;
std::cout << res_two << std::endl;
}
Compiled with:
g++ --std=c++17 -O3 -march=native main.cpp
The code generated the following asm for the two functions:
00000000004007c0 <_Z11process_onev>:
4007c0: b8 2a 00 00 00 mov $0x2a,%eax
4007c5: c3 retq
4007c6: 66 2e 0f 1f 84 00 00 nopw %cs:0x0(%rax,%rax,1)
4007cd: 00 00 00
00000000004007d0 <_Z11process_twov>:
4007d0: 48 c7 44 24 f8 2a 00 movq $0x2a,-0x8(%rsp)
4007d7: 00 00
4007d9: 0f ae f0 mfence
4007dc: 48 8b 44 24 f8 mov -0x8(%rsp),%rax
4007e1: c3 retq
4007e2: 66 2e 0f 1f 84 00 00 nopw %cs:0x0(%rax,%rax,1)
4007e9: 00 00 00
4007ec: 0f 1f 40 00 nopl 0x0(%rax)
Personally I don't speak much assembler but (and I might be mistaken here) it seems that process_two compiled to include all of process_one's and then some.
However, as far as I know, 'modern' x86-64 processors (e.g. Haswell, on which I compiled this) will do assignment atomically without the need for any extra operations (in this case I believe the extra operation is the mfence
instruction in process_two).
So why wouldn't gcc just optimize the code in process two to behave exactly the case as process one ? Given the flags I compiled with.
Are there still cases where an atomic store behaves differently than an assignment to a normal variable given that they are both on 8 bytes.
The reason for it is that default use of std::atomic
also implies memory order
std::memory_order order = std::memory_order_seq_cst
To achieve this consistency the compiler has to tell processor to not reorder instructions. And it does by using mfence instruction.
Change your
a = assign;
to
a.store(assign, std::memory_order_relaxed);
and your output will change from
process_two():
mov QWORD PTR [rsp-8], 42
mfence
mov rax, QWORD PTR [rsp-8]
ret
to
process_two():
mov QWORD PTR [rsp-8], 42
mov rax, QWORD PTR [rsp-8]
ret
Just as you expected it to be.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With