Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Are lock-free atomics address-free in practice?

Boost.Interprocess is a wonderful library that simplifies the usage of shared memory amongst different processes. It provides mutexes, condition variables, and semaphores, which allow for synchronization when writing and reading from the shared memory.

However, in some situations these (relatively) performance-intensive synchronization mechanisms are not necessary - atomic operations suffice for my use case, and will likely give much better performance.

Unfortunately, Boost.Interprocess does not seem to come with atomics.


The C++ Standard Library provides the std::atomic class template, which encapsulates objects whose operations need to be atomic, and also has functions to test if atomic operations are lock-free. But it does not require lock-free atomics to be address-free as well: [atomics.lockfree]/4 merely encourages that lock-free operations be address-free, and this is in agreement with cppreference.

I cannot think of any reason why one would implement lock-free atomics in a non-address-free manner. It even appears to me to be considerably easier to implement lock-free atomics in an address-free manner.

Since I would gain significant performance benefits when using atomics instead of mutexes (from Boost.Interprocess), it seems tempting to discount standard-compliance here and store std::atomic objects in the shared memory.


There are two parts to this question:

  1. Do CPUs implement lock-free atomics in an address-free manner in practice? (I only care about CPUs that are used to run modern desktop and mobile operating systems (e.g. Windows, MacOS, Linux, Android, iOS), but not embedded systems)
  2. Why on earth would an implementation use non-address-free atomics that are lock-free?
like image 831
Bernard Avatar asked Jul 22 '18 08:07

Bernard


People also ask

Are Atomic locks free?

atomic<T> variables don't use locks (at least where T is natively atomic on your platform), but they're not lock-free in the sense above. You might use them in the implementation of a lock-free container, but they're not sufficient on their own.

Is STD atomic bool lock-free?

atomic<bool>. std::atomic_flag i s the only atomic data type that is lock-free.

What means lock-free?

At its essence, lock-free is a property used to describe some code, without saying too much about how that code was actually written. Basically, if some part of your program satisfies the following conditions, then that part can rightfully be considered lock-free.


1 Answers

Yes, lock-free atomics are address-free on all C++ implementations on all normal CPUs, and can safely be used on shared-memory between processes. Non-lock-free atomics1won't be safe between processes, though. Each process will have its own hash table of locks (Where is the lock for a std::atomic?).

The C++ standard intends lock-free atomics to work in shared memory between processes, but it can only go as far as "should" without defining terms and so on.

C++draft 29.5 Lock-free property

[ Note: Operations that are lock-free should also be address-free. That is, atomic operations on the same memory location via two different addresses will communicate atomically. The implementation should not depend on any per-process state. This restriction enables communication by memory that is mapped into a process more than once and by memory that is shared between two processes. — end note ]

This is a quality-of-implementation recommendation that is very easy to implement on current hardware, and in fact you'd have to try hard to design a deathstation9000 C++ implementation that violates it on x86 / ARM / PowerPC / other mainstream CPU while actually being lock-free.


The mechanism hardware exposes for atomic read-modify-write operations is based on MESI cache coherency which only cares about physical addresses. x86 lock cmpxchg / lock add / etc. makes a core hang on to a cache line in Modified state so no other core can read/write it in the middle of the atomic operation. (Can num++ be atomic for 'int num'?).

Most non-x86 architectures use LL/SC, which lets you write a retry loop that only does the store if it will be atomic. LL/SC can emulate CAS with O(1) overhead in a wait-free manner without introducing addresses.

C++ lock-free atomics compile to use LL/SC instructions directly. See my answer on the num++ question for x86 examples. See Atomically clearing lowest non-zero bit of an unsigned integer for some examples of AArch64 code-gen for compare_exchange_weak vs fetch_add using LL/SC instructions.

Atomic pure-load or pure-store are easier and happen for free with aligned data. On x86, see Why is integer assignment on a naturally aligned variable atomic on x86? Other ISAs have similar rules.


Related: I included some comments about address-free in my answer on Genuinely test std::atomic is lock-free or not. I'm not sure whether they're helpful or correct. :/


Footnote 1:

All mainstream CPUs have lock-free atomics for objects up to the width of a pointer. Some have wider atomics, like x86 has lock cmpxchg16b, but not all implementations choose to use it for double-width lock-free atomic objects. Check C++17 std::atomic::is_always_lock_free, or ATOMIC_xxx_LOCK_FREE if defined, for compile-time detection.

(Some microcontrollers can't hold a pointer in a single register (or copy it around with a single operation), but there aren't usually multi-core implementations of such ISAs.)


Why on earth would an implementation use non-address-free atomics that are lock-free?

I don't know any plausible reason on hardware that works anything like normal modern CPUs. You could may imagine some architecture where you do atomic operations by submitting the address to some

I think the C++ standard wants to avoid constraining non-mainstream implementations as much as possible. e.g. C++ on top of some kind of interpreter, rather than compiled do machine code for a "normal" CPU architecture.

IDK if you could usefully implement C++ on a loosely-coupled shared memory system like a cluster with ethernet links instead of shared memory, or non-coherent shared memory (that has to be flushed explicitly for other threads to see your stores).

I think it's mostly that the C++ committee can't say much about how atomics must be implemented without assuming that implementations will run programs under an OS where multiple processes can set up shared memory.

They might be imagining some future ISA where address-free atomics aren't possible, but I think more likely they don't want to talk about shared-memory between multiple C++ programs. The standard only requires that an implementation run one program.

Apparently std::atomic_flag is actually guaranteed to be address-free Why only std::atomic_flag is guaranteed to be lock-free?, so IDK why they don't make the same requirement for any atomic<T> that the implementation chooses to implement as lock-free.

like image 189
Peter Cordes Avatar answered Oct 01 '22 15:10

Peter Cordes