Why is CPUID + RDTSC unreliable?

Question

I am trying to profile a code for execution time on an x86-64 processor. I am referring to this Intel white paper and also gone through other SO threads discussing the topic of using RDTSCP vs CPUID+RDTSC here and here.

In the above mentioned whitepaper, the method using CPUID+RDTSC is termed unreliable and also proven using the statistics.

What might be the reason for the CPUID+RDTSC being unreliable?

Also, the graphs in Figure 1(Minimum value Behavior graph) and Figure 2 (Variance Behavior graph) in the same white paper have got a "Square wave" pattern. What explains such a pattern?

Peter Cordes · Accepted Answer

I think they're finding that CPUID inside the measurement interval causes extra variability in the total time. Their proposed fix in 3.2 Improvements Using RDTSCP Instruction highlights the fact that there's no CPUID inside the timed interval when they use CPUID / RDTSC to start, and RDTSCP/CPUID to stop.

Perhaps they could have ensured EAX=0 or EAX=1 before executing CPUID, to choose which CPUID leaf of data to read (http://www.sandpile.org/x86/cpuid.htm#level_0000_0000h), in case CPUID time taken depends on which query you make. Other than that, I'm unsure why that would be.

Or better, use lfence instead of cpuid to serialize OoO exec without being a full serializing operation.

Note that the inline asm in Intel's whitepaper sucks: there's no need for those mov instructions if you use proper output constraints like "=a"(low), "=d"(high). See How to get the CPU cycle count in x86_64 from C++? for better ways.

Why is CPUID + RDTSC unreliable?

Tags:

x86

microbenchmark

intel

cpuid

rdtsc

talekeDskobeDa

1 Answers

Peter Cordes

Recent Activity

Donate For Us

Why is CPUID + RDTSC unreliable?

Tags:

x86

microbenchmark

intel

cpuid

rdtsc

talekeDskobeDa

1 Answers

Peter Cordes

Related questions

Recent Activity

Donate For Us