Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

In C++, is there any effective difference between a acquire/release atomic access and a relaxed access combined with a fence?

Specifically, is there any effective difference between:

i = a.load(memory_order_acquire);

or

a.store(5, memory_order_release);

and

atomic_thread_fence(memory_order_acquire);
i = a.load(memory_order_relaxed);

or

a.store(5, memory_order_relaxed);
atomic_thread_fence(memory_order_release);

respectively?

Do non-relaxed atomic accesses provide signal fences as well as thread fences?

like image 740
WaltK Avatar asked Mar 09 '23 21:03

WaltK


2 Answers

In your code, for both load and store, the order between the fence and the atomic operation should be reversed and then it is similar to the standalone operations, but there are differences.

Acquire and release operations on atomic variables act as one-way barriers, but in opposite directions. That is, a store/release operation prevents memory operations that precede it (in the program source) from being reordered after it, while a load/acquire operation prevents memory operations that follow it from being reordered before it.

// thread 1
// shared memory operations A
a.store(5, std::memory_order_release);

x = 42; // regular int


// thread 2
while (a.load(std::memory_order_acquire) != 5);
// shared memory operations B

Memory operations A cannot move down below the store/release, while memory operations B cannot move up above the load/acquire. As soon as thread 2 reads 5, memory operation A are visible to B and synchronization is complete.
Being a one-way barrier, the write to x can join, or even precede, memory operations A, but since it is not part of the acquire/release relationship x cannot be reliably accessed by thread 2.

Replacing the atomic operations with standalone thread fences and relaxed operations is similar:

// thread 1
// shared memory operations A
std::atomic_thread_fence(memory_order_release);
a.store(5, std::memory_order_relaxed);


// thread 2
while (a.load(std::memory_order_relaxed) != 5);
std::atomic_thread_fence(memory_order_acquire);
// shared memory operations B

This achieves the same result but an important difference is that both fences do not act as one-way barriers; If they did, the atomic store to a could be reordered before the release fence and the atomic load from a could be reordered after the acquire fence and that would break the synchronization relationship.

In general:

  • A standalone release fence prevents preceding operations from being reordered with (atomic) stores that follow it.
  • A standalone acquire fence prevents following operations from being reordered with (atomic) loads that precede it.

The standard allows Acquire/Release fences to be mixed with Acquire/Release operations.

Do non-relaxed atomic accesses provide signal fences as well as thread fences?

It is not fully clear to me what you are asking here because thread fences are normally used with relaxed atomic operations, but std::thread_signal_fence is similar to a std::atomic_thread_fence, except that it is supposed to operate within the same thread and therefore the compiler does not generate CPU instructions for inter-thread synchronization. It basically acts as a compiler-only barrier.

like image 97
LWimsey Avatar answered Mar 12 '23 23:03

LWimsey


You need

atomic_thread_fence(memory_order_release);
a.store(5, memory_order_relaxed);

and

i = a.load(memory_order_relaxed);
atomic_thread_fence(memory_order_acquire);

To replace

a.store(5, memory_order_release);

and

i = a.load(memory_order_acquire);

Non-relaxed atomic accesses do provide signal fences as well as thread fences.

like image 22
cshu Avatar answered Mar 12 '23 23:03

cshu