Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Why is CPUID + RDTSC unreliable?

I am trying to profile a code for execution time on an x86-64 processor. I am referring to this Intel white paper and also gone through other SO threads discussing the topic of using RDTSCP vs CPUID+RDTSC here and here.

In the above mentioned whitepaper, the method using CPUID+RDTSC is termed unreliable and also proven using the statistics.

What might be the reason for the CPUID+RDTSC being unreliable?

Also, the graphs in Figure 1(Minimum value Behavior graph) and Figure 2 (Variance Behavior graph) in the same white paper have got a "Square wave" pattern. What explains such a pattern?

like image 438
talekeDskobeDa Avatar asked Dec 24 '18 00:12

talekeDskobeDa


1 Answers

I think they're finding that CPUID inside the measurement interval causes extra variability in the total time. Their proposed fix in 3.2 Improvements Using RDTSCP Instruction highlights the fact that there's no CPUID inside the timed interval when they use CPUID / RDTSC to start, and RDTSCP/CPUID to stop.

Perhaps they could have ensured EAX=0 or EAX=1 before executing CPUID, to choose which CPUID leaf of data to read (http://www.sandpile.org/x86/cpuid.htm#level_0000_0000h), in case CPUID time taken depends on which query you make. Other than that, I'm unsure why that would be.

Or better, use lfence instead of cpuid to serialize OoO exec without being a full serializing operation.


Note that the inline asm in Intel's whitepaper sucks: there's no need for those mov instructions if you use proper output constraints like "=a"(low), "=d"(high). See How to get the CPU cycle count in x86_64 from C++? for better ways.

like image 194
Peter Cordes Avatar answered Jan 03 '23 07:01

Peter Cordes