Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Better than 100ns resolution timers in Windows

Tags:

intel

I work on programming language profiler and I am looking for a timer solution for Windows with better than 100 ns resolution.

  • QueryPerformanceCounter should be an answer, but the returned frequency by QueryPerformanceFrequency is 10 MHz on Windows 10 and even less on Windows 7

  • GetSystemTimePreciseAsFileTime has 100 ns tick/step

  • RDTSC has resolution better than 1ns, but it varies with frequency

My target resolution is at least 10 ns.

What is currently the best solution?

How QueryPerformanceCounter is implemented? can it be easily disassembed and the resolution increased?

Is it somehow possible to use RDTSC directly and track/interrupt on every frequency change?

like image 435
mvorisek Avatar asked Aug 01 '20 12:08

mvorisek


1 Answers

How QueryPerformanceCounter is implemented?

QPC timer has different implementations in the HAL depending on hardware; it uses TSC, HPET, RTC, APIC, ACPI or 8254 timers, depending on availability.

QPC timer resolution is hardcoded to 100ns. But it doesn't matter because the call to QPC itself takes >100ns. 100ns is just a very, very short amount of time in Windows world.

RDTSC has resolution better than 1ns, but it varies with frequency

Not really, the TSC frequency is actually pretty stable since Nehalem. See Intel 64 Architecture SDM vol. 3A, "17.16 Invariant TSC":

Processor families increment the time-stamp counter differently:

  • For Pentium M processors (family [06H], models [09H, 0DH]); for Pentium 4 processors, Intel Xeon processors (family [0FH], models [00H, 01H, or 02H]); and for P6 family processors: the time-stamp counter increments with every internal processor clock cycle. The internal processor clock cycle is determined by the current core-clock to bus-clock ratio. Intel SpeedStep technology transitions may also impact the processor clock.

  • For Intel Xeon processors (family [0FH], models [03H and higher]); for Intel Core Solo and Intel Core Duo processors (family [06H], model [0EH]); for the Intel Xeon processor 5100 series and Intel Core 2 Duo processors (family [06H], model [0FH]); for Intel Core 2 and Intel Xeon processors (family [06H], DisplayModel [17H]); for Intel Atom processors (family [06H], DisplayModel [1CH]): the time-stamp counter increments at a constant rate. That rate may be set by the maximum core-clock to bus-clock ratio of the processor or may be set by the maximum resolved frequency at which the processor is booted. The maximum resolved frequency may differ from the processor base frequency, see Section 18.18.2 for more detail. On certain processors, the TSC frequency may not be the same as the frequency in the brand string.

The time stamp counter in newer processors may support an enhancement, referred to as invariant TSC. Processor’s support for invariant TSC is indicated by CPUID.80000007H:EDX[8]. The invariant TSC will run at a constant rate in all ACPI P-, C-. and T-states. This is the architectural behavior moving forward. On processors with invariant TSC support, the OS may use the TSC for wall clock timer services (instead of ACPI or HPET timers). TSC reads are much more efficient and do not incur the overhead associated with a ring transition or access to a platform resource.

So for quick measurements you should be able to use __rdtsc or __rdtscp. You can calibrate for the TSC frequency at startup time and ensure it doesn't depend on CPU states. The thread could still be preempted though, so it's good to repeat the measurement multiple times or use QueryThreadCycleTime (though of course it comes with its own overhead). In practice I find RDTSC not as bad as it is presented in Calculate system time using rdtsc, though the latter is still a good read.

like image 102
rustyx Avatar answered Sep 29 '22 02:09

rustyx