If T
is a C++ fundamental type, and if std::atomic<T>::is_lock_free()
returns true
, then is there anything in std::atomic<T>
that is wait-free (not just lock-free)? Like, load
, store
, fetch_add
, fetch_sub
, compare_exchange_weak
, and compare_exchange_strong
.
Can you also answer based on what is specified in the C++ Standard, and what is implemented in Clang and/or GCC (your version of choice).
My favorite definitions for lock-free and wait-free are taken from C++ Concurrency in Action (available for free). An algorithm is lock-free if it satisfies the first condition bellow, and it is wait-free if it satisfies both conditions below:
Atomic instructions are wait-free. CAS and LL/SC loops are lock-free. How a method is implemented depends on many factors, including its usage, the compiler, the target architecture and its version.
So most likely std::atomic<int> won't be lock-free. However, you can make a mutex (e.g. a spin lock) with the byte atomic.
Summing up, in general atomic operations are faster if contention between threads is sufficiently low.
atomic weight, also called relative atomic mass, ratio of the average mass of a chemical element's atoms to some standard. Since 1961 the standard unit of atomic mass has been one-twelfth the mass of an atom of the isotope carbon-12.
Since there are many definitions of wait-freedom1 and people choose different ones, I think that a precise definition is paramount, and a distinction between its specializations is necessary and useful.
These are the universally accepted definitions of wait-freedom and its specializations:
wait-free: All threads will make progress in a finite number of steps.
wait-free bounded: All threads will make progress in a bounded number of steps, which may depend on the number of threads.
wait-free population-oblivious3: All threads will make progress in a fixed number of steps, that does not depend on the number of threads.
Overall, the C++ standard makes no distinction between lock-free and wait-free (see this other answer). It always gives guarantees no stronger than lock-free.
When std::atomic<T>::is_lock_free()
returns true
, instead of mutexes the implementation utilizes atomic instructions possibly with CAS loops or LL/SC loops.
Atomic instructions are wait-free. CAS and LL/SC loops are lock-free.
How a method is implemented depends on many factors, including its usage, the compiler, the target architecture and its version. For example:
fetch_add()
for std::atomic<double>
uses a CAS loop (lock cmpxchg
), while for std::atomic<int>
uses lock add
or lock xadd
.fetch_add()
uses a LL/SC loop if no better instructions are available. For example, this is not the case on ARM versions 8.1 and above, where ldaddal
is used for non-relaxed std::atomic<int>
and ldadd
is used if relaxed.fetch_or()
uses lock or
if the return value is not used, otherwise it uses a CAS loop (lock cmpxchg
).As explained in this answer of mine:
store()
method and lock add
, lock xadd
, lock or
instructions are wait-free population-oblivious, while their "algorithm", that is the work performed by the hardware to lock the cache line, is wait-free bounded.load()
method is always wait-free population-oblivious.1 For example:
all threads will make progress in a finite number of steps (source)
all threads will make progress in a bounded number of steps2
per "step" that they all execute, all threads will make forward progress without any starvation (source)
2 It is not clear whether the bound is constant, or it may depend on the number of threads.
3 A strange name and not good for an acronym, so maybe another one should be chosen.
There exist universally accepted definitions of lock-freedom and wait-freedom, and the definitions provided in your question are consistent with those. And I would strongly assume that the C++ standard committee certainly sticks to definitions that are universally accepted in the scientific community.
In general, publications on lock-free/wait-free algorithms assume that CPU instructions are wait-free. Instead, the arguments about progress guarantees focus on the software algorithm.
Based on this assumption I would argue that any std::atomic
method that can be translated to a single atomic instruction for some architecture is wait-free on that specific architecture. Whether such a translation is possible sometimes depends on how the method is used though. Take for example fetch_or
. On x86 this can be translated to lock or
, but only if you do not use its return value, because this instruction does not provide the original value. If you use the return value, then the compiler will create a CAS-loop, which is lock-free, but not wait-free. (And the same goes for fetch_and
/fetch_xor
.)
So which methods are actually wait-free depends not only on the compiler, but mostly on the target architecture.
Whether it is technically correct to assume that a single instruction is actually wait-free or not is a rather philosophical one IMHO. True, it might not be guaranteed that an instruction finishes execution within a bounded number of "steps" (whatever such a step might be), but the machine instruction is still the smallest unit on the lowest level that we can see and control. Actually, if you cannot assume that a single instruction is wait-free, then strictly speaking it is not possible to run any real-time code on that architecture, because real-time also requires strict bounds on time/the number of steps.
This is what the C++17 standard states in [intro.progress]
:
Executions of atomic functions that are either defined to be lock-free (32.8) or indicated as lock-free (32.5) are lock-free executions.
- If there is only one thread that is not blocked (3.6) in a standard library function, a lock-free execution in that thread shall complete. [ Note: Concurrently executing threads may prevent progress of a lock-free execution. For example, this situation can occur with load-locked store-conditional implementations. This property is sometimes termed obstruction-free. — end note ]
- When one or more lock-free executions run concurrently, at least one should complete. [ Note: It is difficult for some implementations to provide absolute guarantees to this effect, since repeated and particularly inopportune interference from other threads may prevent forward progress, e.g., by repeatedly stealing a cache line for unrelated purposes between load-locked and store-conditional instructions. Implementations should ensure that such effects cannot indefinitely delay progress under expected operating conditions, and that such anomalies can therefore safely be ignored by programmers. Outside this document, this property is sometimes termed lock-free. — end note ]
The other answer correctly pointed out that my original answer was a bit imprecise, since there exist two stronger subtypes of wait-freedom.
So strictly speaking, the definition in the question is consistent with the definition of wait-free bounded.
In practice, most wait-free algorithms are actually wait-free bounded or even wait-free population oblivious, i.e., it is possible to determine an upper bound on the number of steps.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With