Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

atomic<T>.load() with std::memory_order_release

When writing C++11 code that uses the newly introduced thread-synchronization primitives to make use of the relaxed memory ordering, you usually see either

std::atomic<int> vv;
int i = vv.load(std::memory_order_acquire);

or

vv.store(42, std::memory_order_release);

It is clear to me why this makes sense.

My questions are: Do the combinations vv.store(42, std::memory_order_acquire) and vv.load(std::memory_order_release) also make sense? In which situation could one use them? What are the semantics of these combinations?

like image 620
Toby Brull Avatar asked May 10 '26 03:05

Toby Brull


2 Answers

That's simply not allowed. The C++ (11) standard has requirements on what memory order constraints you can put on load/store operations.

For load (§29.6.5):

Requires: The order argument shall not be memory_order_release nor memory_order_acq_rel.

For store:

Requires: The order argument shall not be memory_order_consume, memory_order_acquire, nor memory_order_acq_rel.

like image 84
Mat Avatar answered May 12 '26 17:05

Mat


The C/C++/LLVM memory model is sufficient for synchronization strategies that ensure data is ready to be accessed before accessing it. While that covers most common synchronization primitives, useful properties can be obtained by building consistent models on weaker guarantees.

The biggest example is the seqlock. It relies on "speculatively" reading data that may not be in a consistent state. Because reads are allowed to race with writes, readers don't block writers -- a property which is used in the Linux kernel to allow the system clock to be updated even if a user process is repeatedly reading it. Another strength of the seqlock is that on modern SMP arches it scales perfectly with the number of readers: because the readers don't need to take any locks, they only need shared access to the cache lines.

The ideal implementation of a seqlock would use something like a "release load" in the reader, which is not available in any major programming language. The kernel works around this with a full read fence, which scales well across architectures, but doesn't achieve optimal performance.

like image 23
Kaz Wesley Avatar answered May 12 '26 18:05

Kaz Wesley



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!