Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Standardized ways to pack multiple values into one atomic

Assume I have two atomic variables of type int32, I could instead chose to represent them as std::atomic<int64> both and reserve the first 32 bits for my first int and the last for my second int.

This seems like quite a space & time saver on x64 architectures, not to mention it allows for all sorts of black magic since one can abstract over various operations and make them atomic:

first == a && second == b

becomes

both == ( int64(a) + int64(b) << 32 )
//Or some such... I'm not 100% sure this is correct but you get the idea

The one problem with this trick that I see is that I'm not particularly found with operating at the bit level and C++ is not very kind when it comes to operating at the bit level, especially once you try to accomplish more complex operations or pack more than two variables (e.g. two numbers and several bools) into the same atomic.

So I'm wondering if there is a standardized way to apply this kind of trick. A pattern or even std functionality that is easily recognizable by other coders when seen and easier to work with for the implementer? Likewise, is this pattern useful enough to warrant such a standardization, or does its usefulness quickly become obsolete when compared to the possible annoyances and UB it can bring?

like image 745
George Avatar asked Mar 05 '26 22:03

George


2 Answers

C++ supports aggregate types like structs, including assignment of the whole struct.
struct foo { int a,b; } is a 64-bit type containing two members.

std::atomic<foo> is lock-free in normal compilers exactly the same as std::atomic<int64_t>

As Alex's answer points out, you should avoid padding for alignment between members (or after the last member), and make sure the size is a power of 2. (2, 4, or 8 bytes. Or 16 bytes if your C++ implementation can efficiently make that lock-free, like on some AArch64, and recent x86-64 with up-to-date library support that uses 128-bit load/store on CPUs with AVX.)


If you want efficient read-only access to just the first or second member (like for a pointer + ABA counter), C++ is bad at that, or compilers are bad at optimizing auto tmp = shared.load(acquire); / use only tmp.a (or tmp.ptr or whatever you called the first member). It would be safe on normal ISAs to make asm that really only loaded the first member, not the whole object, and that's potentially much cheaper (e.g. for the first 8 bytes of a 16-byte object on x86-64.)

You can hack something up with std::atomic_ref that will technically have UB but pretty reliably compile to safe asm. Just make sure all writes write the full object. Or before C++20, see How can I implement ABA counter with c++11 CAS? for the same hack with a union of a 16-byte atomic object with a struct of two 8-byte atomic halves. And some details about x86 lock cmpxchg16b which until recently was x86-64's only documented / guaranteed safe way to do a lock-free atomic .load() of 16 bytes. (Now, it's documented that a 16-byte SIMD load is atomic, but that still means you have to bounce the data from an XMM reg back to a GPR. So it's still worth optimizing a load to just load the 8 bytes you want if you don't need a snapshot of the whole object.)

like image 107
Peter Cordes Avatar answered Mar 08 '26 13:03

Peter Cordes


As already mentioned in the answer by @Peter Cordes, you would want to use struct directly in atomic, like struct s { ... }; std::atomic<S> ....

To make sure the struct does not give extra overhead comparing to integer, consider the following:

  • Avoid padding bits. Starting in C++20, the compiler has to skip padding bits during comparison for compare_exchange_*, see p0528. It is not efficient in some architectures, including x86, also an std::atomic implementation may fail to make this efficient even with LL/SC implementation of compare_exchange_*.
  • Avoid odd sizeof. Some STL implementation may not pad structure to complete to uint64_t, resulting in non-lock-free atomic. And sure you don't want to overflow the 64 bit. So consider static_assert on the structure size

Underlying integer type gives advantage of having arithmetic (add) and logic (and, not, xor) operations, which can be useful to manipulate sub-integers (arithmetic may overflow, but this is not a problem for highest sub-integer)

like image 40
Alex Guteniev Avatar answered Mar 08 '26 14:03

Alex Guteniev



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!