Given an example:
#include <thread>
#include <iostream>
int main() {
int a = 0;
volatile int flag = 0;
std::thread t1([&]() {
while (flag != 1);
int b = a;
std::cout << "b = " << b << std::endl;
});
std::thread t2([&]() {
a = 5;
flag = 1;
});
t1.join();
t2.join();
return 0;
}
It is conceptually understandable that flag = 1;
could be reordered and executed before a = 5;
, therefore the result of b could be 5 or 0.
However, realistically, I cannot produce a result that output 0 on my machine. How could we guarantee the behavior or instruction reorder reproducibly? How to change the code example specifically?
First of all: You are in the land of UB because there is a race condition: Both flag
and a
are written and read from different threads without proper synchronization - this is always a data race. The C++ standard does not impose any requirements on implementations when you give them such a program.
There is therefore no way to "guarantee" a specific behavior.
However, we can look at assembly output to determine what a given compiled program can or cannot do. I was not successful at using reordering alone to show the problem with volatile
as synchronization mechanism, but below is a demonstration using a related optimization.
Here is an example of a program that has no data races:
std::atomic<int> a = 0;
std::atomic<int> flag = 0;
std::thread t1([&]() {
while (flag != 1);
int b = a;
std::cout << "b = " << b << std::endl;
});
std::thread t2([&]() {
a = 5;
int x = 1000000;
while (x-- > 1) flag = 0;
flag = 1;
x = 1000000;
while (x-- > 1) flag = 1;
flag = 0;
a = 0;
});
t1.join();
t2.join();
https://wandbox.org/permlink/J1aw4rJP7P9o1h7h
Indeed, the usual output of this program is b = 5
(other outputs are possible, or the program might not terminate at all with "unlucky" scheduling, but there is no UB).
If we use improper synchronization instead, we can see in the assembly that this output is no longer in the realm of possibility (given the guarantees of the x86 platform):
int a = 0;
volatile int flag = 0;
std::thread t1([&]() {
while (flag != 1);
int b = a;
std::cout << "b = " << b << std::endl;
});
std::thread t2([&]() {
a = 5;
int x = 1000000;
while (x-- > 1) flag = 0;
flag = 1;
x = 1000000;
while (x-- > 1) flag = 1;
flag = 0;
a = 0;
});
t1.join();
t2.join();
The assembly for the second thread body, as per https://godbolt.org/z/qsjca1:
std::thread::_State_impl<std::thread::_Invoker<std::tuple<main::{lambda()#2}> > >::_M_run():
mov rcx, QWORD PTR [rdi+8]
mov rdx, QWORD PTR [rdi+16]
mov eax, 999999
.L4:
mov DWORD PTR [rdx], 0
sub eax, 1
jne .L4
mov DWORD PTR [rdx], 1
mov eax, 999999
.L5:
mov DWORD PTR [rdx], 1
sub eax, 1
jne .L5
mov DWORD PTR [rdx], 0
mov DWORD PTR [rcx], 0
ret
Notice how a = 5;
has been completely optimized away. Nowhere in the compiled program does a
get a chance to take the value 5
.
As you can see in https://wandbox.org/permlink/Pnbh38QpyqKzIClY, the program will always output 0 (or not terminate), even though the original C++ code for thread 2 would - in a "naive" interpretation - always have a == 5
while flag == 1
.
The while
loops are of course to "burn time" and give the other thread a chance to interleave - sleep
or other system calls would generally constitute a memory barrier for the compiler and might break the effect of the second snippet.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With