In order to solve this problem, C++ offers atomic variables that are thread-safe. The atomic type is implemented using mutex locks. If one thread acquires the mutex lock, then no other thread can acquire it until it is released by that particular thread.
If you have a program that reads a property atomically, this means that the property cannot change during this read operation. In Swift, operations on a dictionary are not atomic. This means that in a multithreaded application, a dictionary might be changed from one thread while another thread is reading from it.
Yes, it would be threadsafe. Assuming of course there are no bugs in the std::atomic implementation - but it's not usually hard to get right. This is exactly what std::atomic is meant to do.
What Does Atomic Operation Mean? Atomic operations in concurrent programming are program operations that run completely independently of any other processes. Atomic operations are used in many modern operating systems and parallel processing systems.
Each instantiation and full specialization of std::atomic<> represents a type that different threads can simultaneously operate on (their instances), without raising undefined behavior:
Objects of atomic types are the only C++ objects that are free from data races; that is, if one thread writes to an atomic object while another thread reads from it, the behavior is well-defined.
In addition, accesses to atomic objects may establish inter-thread synchronization and order non-atomic memory accesses as specified by
std::memory_order
.
std::atomic<>
wraps operations that, in pre-C++ 11 times, had to be performed using (for example) interlocked functions with MSVC or atomic bultins in case of GCC.
Also, std::atomic<>
gives you more control by allowing various memory orders that specify synchronization and ordering constraints. If you want to read more about C++ 11 atomics and memory model, these links may be useful:
Note that, for typical use cases, you would probably use overloaded arithmetic operators or another set of them:
std::atomic<long> value(0);
value++; //This is an atomic op
value += 5; //And so is this
Because operator syntax does not allow you to specify the memory order, these operations will be performed with std::memory_order_seq_cst
, as this is the default order for all atomic operations in C++ 11. It guarantees sequential consistency (total global ordering) between all atomic operations.
In some cases, however, this may not be required (and nothing comes for free), so you may want to use more explicit form:
std::atomic<long> value {0};
value.fetch_add(1, std::memory_order_relaxed); // Atomic, but there are no synchronization or ordering constraints
value.fetch_add(5, std::memory_order_release); // Atomic, performs 'release' operation
Now, your example:
a = a + 12;
will not evaluate to a single atomic op: it will result in a.load()
(which is atomic itself), then addition between this value and 12
and a.store()
(also atomic) of final result. As I noted earlier, std::memory_order_seq_cst
will be used here.
However, if you write a += 12
, it will be an atomic operation (as I noted before) and is roughly equivalent to a.fetch_add(12, std::memory_order_seq_cst)
.
As for your comment:
A regular
int
has atomic loads and stores. Whats the point of wrapping it withatomic<>
?
Your statement is only true for architectures that provide such guarantee of atomicity for stores and/or loads. There are architectures that do not do this. Also, it is usually required that operations must be performed on word-/dword-aligned address to be atomic std::atomic<>
is something that is guaranteed to be atomic on every platform, without additional requirements. Moreover, it allows you to write code like this:
void* sharedData = nullptr;
std::atomic<int> ready_flag = 0;
// Thread 1
void produce()
{
sharedData = generateData();
ready_flag.store(1, std::memory_order_release);
}
// Thread 2
void consume()
{
while (ready_flag.load(std::memory_order_acquire) == 0)
{
std::this_thread::yield();
}
assert(sharedData != nullptr); // will never trigger
processData(sharedData);
}
Note that assertion condition will always be true (and thus, will never trigger), so you can always be sure that data is ready after while
loop exits. That is because:
store()
to the flag is performed after sharedData
is set (we assume that generateData()
always returns something useful, in particular, never returns NULL
) and uses std::memory_order_release
order:
memory_order_release
A store operation with this memory order performs the release operation: no reads or writes in the current thread can be reordered after this store. All writes in the current thread are visible in other threads that acquire the same atomic variable
sharedData
is used after while
loop exits, and thus after load()
from flag will return a non-zero value. load()
uses std::memory_order_acquire
order:
std::memory_order_acquire
A load operation with this memory order performs the acquire operation on the affected memory location: no reads or writes in the current thread can be reordered before this load. All writes in other threads that release the same atomic variable are visible in the current thread.
This gives you precise control over the synchronization and allows you to explicitly specify how your code may/may not/will/will not behave. This would not be possible if only guarantee was the atomicity itself. Especially when it comes to very interesting sync models like the release-consume ordering.
std::atomic
exists because many ISAs have direct hardware support for it
What the C++ standard says about std::atomic
has been analyzed in other answers.
So now let's see what std::atomic
compiles to to get a different kind of insight.
The main takeaway from this experiment is that modern CPUs have direct support for atomic integer operations, for example the LOCK prefix in x86, and std::atomic
basically exists as a portable interface to those intructions: What does the "lock" instruction mean in x86 assembly? In aarch64, LDADD would be used.
This support allows for faster alternatives to more general methods such as std::mutex
, which can make more complex multi-instruction sections atomic, at the cost of being slower than std::atomic
because std::mutex
it makes futex
system calls in Linux, which is way slower than the userland instructions emitted by std::atomic
, see also: Does std::mutex create a fence?
Let's consider the following multi-threaded program which increments a global variable across multiple threads, with different synchronization mechanisms depending on which preprocessor define is used.
main.cpp
#include <atomic>
#include <iostream>
#include <thread>
#include <vector>
size_t niters;
#if STD_ATOMIC
std::atomic_ulong global(0);
#else
uint64_t global = 0;
#endif
void threadMain() {
for (size_t i = 0; i < niters; ++i) {
#if LOCK
__asm__ __volatile__ (
"lock incq %0;"
: "+m" (global),
"+g" (i) // to prevent loop unrolling
:
:
);
#else
__asm__ __volatile__ (
""
: "+g" (i) // to prevent he loop from being optimized to a single add
: "g" (global)
:
);
global++;
#endif
}
}
int main(int argc, char **argv) {
size_t nthreads;
if (argc > 1) {
nthreads = std::stoull(argv[1], NULL, 0);
} else {
nthreads = 2;
}
if (argc > 2) {
niters = std::stoull(argv[2], NULL, 0);
} else {
niters = 10;
}
std::vector<std::thread> threads(nthreads);
for (size_t i = 0; i < nthreads; ++i)
threads[i] = std::thread(threadMain);
for (size_t i = 0; i < nthreads; ++i)
threads[i].join();
uint64_t expect = nthreads * niters;
std::cout << "expect " << expect << std::endl;
std::cout << "global " << global << std::endl;
}
GitHub upstream.
Compile, run and disassemble:
comon="-ggdb3 -O3 -std=c++11 -Wall -Wextra -pedantic main.cpp -pthread"
g++ -o main_fail.out $common
g++ -o main_std_atomic.out -DSTD_ATOMIC $common
g++ -o main_lock.out -DLOCK $common
./main_fail.out 4 100000
./main_std_atomic.out 4 100000
./main_lock.out 4 100000
gdb -batch -ex "disassemble threadMain" main_fail.out
gdb -batch -ex "disassemble threadMain" main_std_atomic.out
gdb -batch -ex "disassemble threadMain" main_lock.out
Extremely likely "wrong" race condition output for main_fail.out
:
expect 400000
global 100000
and deterministic "correct" output of the others:
expect 400000
global 400000
Disassembly of main_fail.out
:
0x0000000000002780 <+0>: endbr64
0x0000000000002784 <+4>: mov 0x29b5(%rip),%rcx # 0x5140 <niters>
0x000000000000278b <+11>: test %rcx,%rcx
0x000000000000278e <+14>: je 0x27b4 <threadMain()+52>
0x0000000000002790 <+16>: mov 0x29a1(%rip),%rdx # 0x5138 <global>
0x0000000000002797 <+23>: xor %eax,%eax
0x0000000000002799 <+25>: nopl 0x0(%rax)
0x00000000000027a0 <+32>: add $0x1,%rax
0x00000000000027a4 <+36>: add $0x1,%rdx
0x00000000000027a8 <+40>: cmp %rcx,%rax
0x00000000000027ab <+43>: jb 0x27a0 <threadMain()+32>
0x00000000000027ad <+45>: mov %rdx,0x2984(%rip) # 0x5138 <global>
0x00000000000027b4 <+52>: retq
Disassembly of main_std_atomic.out
:
0x0000000000002780 <+0>: endbr64
0x0000000000002784 <+4>: cmpq $0x0,0x29b4(%rip) # 0x5140 <niters>
0x000000000000278c <+12>: je 0x27a6 <threadMain()+38>
0x000000000000278e <+14>: xor %eax,%eax
0x0000000000002790 <+16>: lock addq $0x1,0x299f(%rip) # 0x5138 <global>
0x0000000000002799 <+25>: add $0x1,%rax
0x000000000000279d <+29>: cmp %rax,0x299c(%rip) # 0x5140 <niters>
0x00000000000027a4 <+36>: ja 0x2790 <threadMain()+16>
0x00000000000027a6 <+38>: retq
Disassembly of main_lock.out
:
Dump of assembler code for function threadMain():
0x0000000000002780 <+0>: endbr64
0x0000000000002784 <+4>: cmpq $0x0,0x29b4(%rip) # 0x5140 <niters>
0x000000000000278c <+12>: je 0x27a5 <threadMain()+37>
0x000000000000278e <+14>: xor %eax,%eax
0x0000000000002790 <+16>: lock incq 0x29a0(%rip) # 0x5138 <global>
0x0000000000002798 <+24>: add $0x1,%rax
0x000000000000279c <+28>: cmp %rax,0x299d(%rip) # 0x5140 <niters>
0x00000000000027a3 <+35>: ja 0x2790 <threadMain()+16>
0x00000000000027a5 <+37>: retq
Conclusions:
the non-atomic version saves the global to a register, and increments the register.
Therefore, at the end, very likely four writes happen back to global with the same "wrong" value of 100000
.
std::atomic
compiles to lock addq
. The LOCK prefix makes the following inc
fetch, modify and update memory atomically.
our explicit inline assembly LOCK prefix compiles to almost the same thing as std::atomic
, except that our inc
is used instead of add
. Not sure why GCC chose add
, considering that our INC generated a decoding 1 byte smaller.
ARMv8 could use either LDAXR + STLXR or LDADD in newer CPUs: How do I start threads in plain C?
Tested in Ubuntu 19.10 AMD64, GCC 9.2.1, Lenovo ThinkPad P51.
I understand that
std::atomic<>
makes an object atomic.
That's a matter of perspective... you can't apply it to arbitrary objects and have their operations become atomic, but the provided specialisations for (most) integral types and pointers can be used.
a = a + 12;
std::atomic<>
does not (use template expressions to) simplify this to a single atomic operation, instead the operator T() const volatile noexcept
member does an atomic load()
of a
, then twelve is added, and operator=(T t) noexcept
does a store(t)
.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With