Sometimes I encounter code that reads TSC with rdtsc
instruction, but calls cpuid
right before.
Why is calling cpuid
necessary? I realize it may have something to do with different cores having TSC values, but what exactly happens when you call those two instructions in sequence?
It's to prevent out-of-order execution. From a link that has now disappeared from the web (but which was fortuitously copied here before it disappeared), this text is from an article entitled "Performance monitoring" by one John Eckerdal:
The Pentium Pro and Pentium II processors support out-of-order execution instructions may be executed in another order as you programmed them. This can be a source of errors if not taken care of.
To prevent this the programmer must serialize the the instruction queue. This can be done by inserting a serializing instruction like CPUID instruction before the RDTSC instruction.
Two reasons:
CPUID is serializing, preventing out-of-order execution of RDTSC.
These days you can safely use LFENCE instead. It's documented as serializing on the instruction stream (but not stores to memory) on Intel CPUs, and now also on AMD after their microcode update for Spectre.
https://hadibrais.wordpress.com/2018/05/14/the-significance-of-the-x86-lfence-instruction/ explains more about LFENCE.
See also https://www.intel.com/content/dam/www/public/us/en/documents/white-papers/ia-32-ia-64-benchmark-code-execution-paper.pdf for a way to use RDTSCP that keeps CPUID (or LFENCE) out of the timed region:
LFENCE ; (or CPUID) Don't start the timed region until everything above has executed
RDTSC ; EDX:EAX = timestamp
mov ebx, eax ; low 32 bits of start time
code under test
RDTSCP ; built-in one way barrier stops it from running early
LFENCE ; (or CPUID) still use a barrier after to prevent anything weird
sub eax, ebx ; low 32 bits of end-start
See also Get CPU cycle count? for more about RDTSC caveats, like constant_tsc and nonstop_tsc.
As a bonus, RDTSCP gives you a core ID. You could use RDTSCP for the start time as well, if you want to check for core migration. But if your CPU has the constant_tsc
features, all cores in the package should have their TSCs synced so you typically don't need this on modern x86.
You could get the core ID from CPUID instead, as @Tony's answer points out.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With