Anything in std::atomic is wait-free?

Tags:

If T is a C++ fundamental type, and if std::atomic<T>::is_lock_free() returns true, then is there anything in std::atomic<T> that is wait-free (not just lock-free)? Like, load, store, fetch_add, fetch_sub, compare_exchange_weak, and compare_exchange_strong.

Can you also answer based on what is specified in the C++ Standard, and what is implemented in Clang and/or GCC (your version of choice).

My favorite definitions for lock-free and wait-free are taken from C++ Concurrency in Action (available for free). An algorithm is lock-free if it satisfies the first condition bellow, and it is wait-free if it satisfies both conditions below:

If one of the threads accessing the data structure is suspended by the scheduler midway through its operation, the other threads must still be able to complete their operations without waiting for the suspended thread.
Every thread accessing the data structure can complete its operation within a bounded number of steps, regardless of the behavior of other threads.

653

asked May 17 '20 09:05

Koosha

2 Answers

Since there are many definitions of wait-freedom¹ and people choose different ones, I think that a precise definition is paramount, and a distinction between its specializations is necessary and useful.

These are the universally accepted definitions of wait-freedom and its specializations:

wait-free: All threads will make progress in a finite number of steps.
wait-free bounded: All threads will make progress in a bounded number of steps, which may depend on the number of threads.
wait-free population-oblivious³: All threads will make progress in a fixed number of steps, that does not depend on the number of threads.

Overall, the C++ standard makes no distinction between lock-free and wait-free (see this other answer). It always gives guarantees no stronger than lock-free.

When std::atomic<T>::is_lock_free() returns true, instead of mutexes the implementation utilizes atomic instructions possibly with CAS loops or LL/SC loops.
Atomic instructions are wait-free. CAS and LL/SC loops are lock-free.

How a method is implemented depends on many factors, including its usage, the compiler, the target architecture and its version. For example:

As someone says, on x86 gcc, fetch_add() for std::atomic<double> uses a CAS loop (lock cmpxchg), while for std::atomic<int> uses lock add or lock xadd.
As someone else says, on architectures featuring LL/SC instructions, fetch_add() uses a LL/SC loop if no better instructions are available. For example, this is not the case on ARM versions 8.1 and above, where ldaddal is used for non-relaxed std::atomic<int> and ldadd is used if relaxed.
As stated in this other answer, on x86 gcc fetch_or() uses lock or if the return value is not used, otherwise it uses a CAS loop (lock cmpxchg).

As explained in this answer of mine:

The store() method and lock add, lock xadd, lock or instructions are wait-free population-oblivious, while their "algorithm", that is the work performed by the hardware to lock the cache line, is wait-free bounded.
The load() method is always wait-free population-oblivious.

¹ For example:

all threads will make progress in a finite number of steps ^(source)
all threads will make progress in a bounded number of steps²
per "step" that they all execute, all threads will make forward progress without any starvation ^(source)

² It is not clear whether the bound is constant, or it may depend on the number of threads.

³ A strange name and not good for an acronym, so maybe another one should be chosen.

answered Sep 24 '22 07:09

Atom 12

There exist universally accepted definitions of lock-freedom and wait-freedom, and the definitions provided in your question are consistent with those. And I would strongly assume that the C++ standard committee certainly sticks to definitions that are universally accepted in the scientific community.

In general, publications on lock-free/wait-free algorithms assume that CPU instructions are wait-free. Instead, the arguments about progress guarantees focus on the software algorithm.

Based on this assumption I would argue that any std::atomic method that can be translated to a single atomic instruction for some architecture is wait-free on that specific architecture. Whether such a translation is possible sometimes depends on how the method is used though. Take for example fetch_or. On x86 this can be translated to lock or, but only if you do not use its return value, because this instruction does not provide the original value. If you use the return value, then the compiler will create a CAS-loop, which is lock-free, but not wait-free. (And the same goes for fetch_and/fetch_xor.)

So which methods are actually wait-free depends not only on the compiler, but mostly on the target architecture.

Whether it is technically correct to assume that a single instruction is actually wait-free or not is a rather philosophical one IMHO. True, it might not be guaranteed that an instruction finishes execution within a bounded number of "steps" (whatever such a step might be), but the machine instruction is still the smallest unit on the lowest level that we can see and control. Actually, if you cannot assume that a single instruction is wait-free, then strictly speaking it is not possible to run any real-time code on that architecture, because real-time also requires strict bounds on time/the number of steps.

This is what the C++17 standard states in [intro.progress]:

Executions of atomic functions that are either deﬁned to be lock-free (32.8) or indicated as lock-free (32.5) are lock-free executions.

If there is only one thread that is not blocked (3.6) in a standard library function, a lock-free execution in that thread shall complete. [ Note: Concurrently executing threads may prevent progress of a lock-free execution. For example, this situation can occur with load-locked store-conditional implementations. This property is sometimes termed obstruction-free. — end note ]

When one or more lock-free executions run concurrently, at least one should complete. [ Note: It is difficult for some implementations to provide absolute guarantees to this effect, since repeated and particularly inopportune interference from other threads may prevent forward progress, e.g., by repeatedly stealing a cache line for unrelated purposes between load-locked and store-conditional instructions. Implementations should ensure that such effects cannot indefinitely delay progress under expected operating conditions, and that such anomalies can therefore safely be ignored by programmers. Outside this document, this property is sometimes termed lock-free. — end note ]

The other answer correctly pointed out that my original answer was a bit imprecise, since there exist two stronger subtypes of wait-freedom.

wait-free - A method is wait-free if it guarantees that every call finishes its execution in a finite number of steps, i.e., it is not possible to determine an upper bound, but it must still be guaranteed that the number of steps is finite.
wait-free bounded - A method is wait-free bounded if it guarantees that every call finishes its execution in a bounded number of steps, where this bound may depend on the number of threads.
wait-free population oblivious - A method is wait-free population oblivious if it guarantees that every call finishes its execution in a bounded number of steps, and this bound does not depend on the number of threads.

So strictly speaking, the definition in the question is consistent with the definition of wait-free bounded.

In practice, most wait-free algorithms are actually wait-free bounded or even wait-free population oblivious, i.e., it is possible to determine an upper bound on the number of steps.

186

answered Sep 23 '22 07:09

mpoeter

Related questions
                            
                                Use nlohmann json to unpack list of integers to a std::vector<int>
                            
                                Does stack space required by a function affect inlining decisions in C/C++?
                            
                                How to override an operator for polymorphism
                            
                                Why is vector faster than unordered_map?
                            
                                Midpoint thick ellipse drawing algorithm
                            
                                Using incomplete type in a member function of a class template
                            
                                Adding extra constness causes compiler error
                            
                                Why is reading uint8_t as hex not working as expected?
                            
                                Why does the output format of an int8_t use 4 bytes?
                            
                                Why does brace initialization assignment fill variables with garbage?
                            
                                Why is sorting not taking O(n log (n)) in time
                            
                                Why is this C++ fold expression valid?
                            
                                How deep can I #define?
                            
                                Why does Expected<T> in LLVM implement two constructors for Expected<T>&&?
                            
                                Is casting to simd-type undefined behaviour in C++? [duplicate]
                            
                                Why are template arguments not deduced automatically in that example?
                            
                                Chromium Edge-based WebView2 does not work
                            
                                Leaving member functions undefined
                            
                                In C++ what is the point of std::array if the size has to be determined at compile time?
                            
                                Sequence points - is this gcc warning a bug?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Anything in std::atomic is wait-free?

Tags:

c++

language-lawyer

stdatomic

lock-free

wait-free

Koosha

People also ask

2 Answers

Atom 12

mpoeter

Recent Activity

Donate For Us