Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

std::chrono::clock, hardware clock and cycle count

std::chrono offer several clocks to measure times. At the same time, I guess the only way a cpu can evaluate time, is by counting cycles.

Question 1: Does a cpu or a gpu has any other way to evaluate time than by counting cycles?

If that is the case, because the way a computer count cycles will never be as precise as an atomic clock, it means that a "second" (period = std::ratio<1>) for a computer can be actually shorter or bigger than an actual second, causing differences in the long run for time measurements between the computer clock and let's say GPS.

Question 2: Is that correct?

Some hardware have varying frequencies (for example idle mode, and turbo modes). In that case, it would mean that the number of cycles would vary during a second.

Question 3: Is the "cycle count" measured by cpu and gpus varying depending on the hardware frequency? If yes, then how std::chrono deal with it? If not, what does a cycle correspond to (like what is the "fundamental" time)? Is there a way to access the conversion at compile-time? Is there a way to access the conversion at runtime?

like image 678
Vincent Avatar asked Jun 15 '18 23:06

Vincent


2 Answers

Counting cycles, yes, but cycles of what?

On a modern x86, the timesource used by the kernel (internally and for clock_gettime and other system calls) is typically a fixed-frequency counter that counts "reference cycles" regardless of turbo, power-saving, or clock-stopped idle. (This is the counter you get from rdtsc, or __rdtsc() in C/C++).

Normal std::chrono implementations will use an OS-provided function like clock_gettime on Unix. (On Linux, this can run purely in user-space, code + scale factor data in a VDSO page mapped by the kernel into every process's address space. Low-overhead timesources are nice. Avoiding a user->kernel->user round trip helps a lot with Meltdown + Spectre mitigation enabled.)

Profiling a tight loop that's not memory bound might want to use actual core clock cycles, so it will be insensitive to the actual speed of the current core. (And doesn't have to worry about ramping up the CPU to max turbo, etc.) e.g. using perf stat ./a.out or perf record ./a.out. e.g. Can x86's MOV really be "free"? Why can't I reproduce this at all?


Some systems didn't / don't have a wall-clock-equivalent counter built right in to the CPU, so either the OS would maintain a time in RAM that it updates on timer interrupts, or time-query functions would read the time from a separate chip.

(System call + hardware I/O = higher overhead, which is part of the reason that x86's rdtsc instruction morphed from a profiling thing into a clocksource thing.)

All of these clock frequencies are ultimately derived from a crystal oscillator on the mobo. But the scale factors to extrapolate time from cycle counts can be adjusted to keep the clock in sync with atomic time, typically using the Network Time Protocol (NTP), as @Tony points out.

like image 59
Peter Cordes Avatar answered Sep 21 '22 19:09

Peter Cordes


Question 1: Does a cpu or a gpu has any other way to evaluate time than by counting cycles?

Different hardware may provide different facilities. For example, x86 PCs have employed several hardware facilities for timing: for the last decade or so x86 CPUs have Time Stamp Counters operating at their processing frequency or - more recently - some fixed frequency (a "constant rate" aka "invariant" TSC); there may be a High Precision Event Timer, and going back further there were Programmable Interrupt Timers (https://en.wikipedia.org/wiki/Programmable_interval_timer).

If that is the case, because the way a computer count cycles will never be as precise as an atomic clock, it means that a "second" (period = std::ratio<1>) for a computer can be actually shorter or bigger than an actual second, causing differences in the long run for time measurements between the computer clock and let's say GPS.

Yes, a computer without an atomic clock (they're now available on a chip) isn't going to be as accurate as an atomic clock. That said, services such as Network Time Protocol allow you to maintain tighter coherence across a bunch of computers. It is sometimes aided by use of Pulse Per Second (PPS) techniques. More modern and accurate variants include Precision Time Protocol (PTP) (which can often achieve sub-microsecond accuracy across a LAN).

Question 3: Is the "cycle count" measured by cpu and gpus varying depending on the hardware frequency?

That depends. For TSC, newer "constant rate" TSC implementations don't vary, others do vary.

If yes, then how std::chrono deal with it?

I'd expect most implementations to call an OS provided time service, as the OS tends to have best knowledge of and access to the hardware. There are a lot of factors that need to be considered - e.g. whether the TSC readings are in sync across cores, what happens if the PC goes into some kind of sleep mode, what manner of memory fences are desirable around the TSC sampling....

If not, what does a cycle correspond to (like what is the "fundamental" time)?

For Intel CPUs, see this answer.

Is there a way to access the conversion at compile-time? Is there a way to access the conversion at runtime?

std::chrono::duration::count exposes raw tick counts for whatever time source was used, and you can duraction_cast to other units of time (e.g. seconds). C++20 is expected to introduce further facilities like clock_cast. AFAIK, there's no constexpr conversion available: seems dubious too if a program might end up running on a machine with a different TSC rate than the machine it was compiled on.

like image 32
Tony Delroy Avatar answered Sep 24 '22 19:09

Tony Delroy