I am looking at an open source C++ project which has the following code structure:
while(true) {
// Do something work
if(some_condition_becomes_true)
break;
__asm volatile ("pause" ::: "memory");
}
What does the last statement do? I understand that __asm
means that it is an assembly instruction and I found some posts about pause
instruction which say that the thread effectively hints the core to release resources and give other thread more resources (in context of hyper-threading). But what does :::
do and what does memory
do?
If the C code that follows the asm makes no use of any of the output operands, use volatile for the asm statement to prevent the optimizers from discarding the asm statement as unneeded (see Volatile).
the pause instruction gives a hint to the processor that the calling thread is in a "spin-wait" loop. In addition, the pause instruction is a no-op when used on x86 architectures that do not support Intel SSE2, meaning it will still execute without doing anything or raising a fault.
It's _mm_pause()
and a compile memory barrier wrapped into one GNU C Extended ASM statement. https://gcc.gnu.org/onlinedocs/gcc/Extended-Asm.html
asm("" ::: "memory")
prevents compile-time reordering of memory operations across it, like C++11 std::atomic_signal_fence(std::memory_order_seq_cst)
. (not atomic_thread_fence
; although on x86 preventing compile-time reordering is sufficient to make it an acquire + release fence because the only run-time reordering that x86 allows is StoreLoad.) See Jeff Preshing's Memory Ordering at Compile Time article.
Making the asm instruction part non-empty also means those asm instructions will run every time the C logically runs that source line (because it's volatile
).
pause
prevents speculative loads from causing memory-ordering mis-speculation pipeling clears (aka machine nukes). It's useful inside spin loops that are waiting to see a value in memory.
You might find this statement inside a spinloop written without C++11 std::atomic, to tell the compiler it has to re-read the value of a global variable. (Because the "memory"
clobber means the compiler has to assume the asm statement might have modified the value of any globally-reachable memory.)
This looks like the context where you found it: some_condition_becomes_true
probably includes reading a non-atomic
/ non-volatile
global.
The C++11 equivalent of your loop:
#include <atomic>
#include <immintrin.h>
std::atomic<int> flag;
void wait_for_flag(void) {
while(flag.load(std::memory_order_seq_cst == 0) {
_mm_pause();
}
}
(Not exactly equivalent, because your version has a full compiler barrier while mine only has a seq-cst load, so it's not a full signal-fence. But probably what wasn't needed, and they just used something stronger than necessary to get the effect of volatile).
Without the barrier or making flag
atomic, the compiler would have optimized it to:
// Do something work
if(some_condition_becomes_true) {
// empty
} else {
while(true) {
// Do something work
__asm volatile ("pause" ::: ); // no memory clobber
}
}
i.e. it would hoist the check on some_condition_becomes_true
out of the loop and not re-read the global every time.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With