Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How do you use the pause assembly instruction in 64-bit C++ code?

Since inlined assembly is not supported by VC++ 2010 in 64-bit code, how do I get a pause x86-64 instruction into my code? There does not appear to be an intrinsic for this like there is for many other common assembly instructions (e.g., __rdtsc(), __cpuid(), etc...).

On the why side, I want the instruction to help with a busy wait use case, so that the (hyperthreaded) CPU is available to other threads running on said CPU (See: Performance Insights at intel.com). The pause instruction is very helpful for this use case as well as spin-lock implementations, I can't understand why MS did not include it as an intrinsic.

Thanks

like image 546
Michael Goldshteyn Avatar asked Apr 29 '11 14:04

Michael Goldshteyn


People also ask

What does the pause instruction do?

The PAUSE instruction provides a hint to the processor that the code sequence is a spin-wait loop. The processor uses this hint to avoid the memory order violation in most situations, which greatly improves processor performance.

What is _mm_pause?

the pause instruction gives a hint to the processor that the calling thread is in a "spin-wait" loop. In addition, the pause instruction is a no-op when used on x86 architectures that do not support Intel SSE2, meaning it will still execute without doing anything or raising a fault.


1 Answers

Wow, this was a very hard problem to track down, but in case anybody else needs the x86-64 pause instruction:

The YieldProcessor() macro from windows.h expands to the undocumented _mm_pause intrinsic, which ultimately expands to the pause instruction in 32-bit and 64-bit code.

This is completely undocumented, by the way, with partial (and incorrect for VC++ 2010 documentation) for YieldProcessor() appearing in MSDN.

Here is an example of what a block of YieldProcessor() macros compiles into:

    19:     ::YieldProcessor();
000000013FDB18A0 F3 90                pause  
    20:     ::YieldProcessor();
000000013FDB18A2 F3 90                pause  
    21:     ::YieldProcessor();
000000013FDB18A4 F3 90                pause  
    22:     ::YieldProcessor();
000000013FDB18A6 F3 90                pause  
    23:     ::YieldProcessor();
000000013FDB18A8 F3 90                pause  

By the way, each pause instruction seems to produce about a 9 cycle delay on the Nehalem architecture, on the average (i.e., 3 ns on a 3.3 GHz CPU).

like image 148
Michael Goldshteyn Avatar answered Sep 21 '22 00:09

Michael Goldshteyn