CPU TSC fetch operation especially in multicore-multi-processor environment

Tags:

In Linux world, to get nano seconds precision timer/clockticks one can use :

#include <sys/time.h>

int foo()
{
   timespec ts;

   clock_gettime(CLOCK_REALTIME, &ts); 
   //--snip--      
}

This answer suggests an asm approach to directly query for the cpu clock with the RDTSC instruction.

In a multi-core, multi-processor architecture, how is this clock ticks/timer value synchronized across multiple cores/processors? My understanding is that there in inherent fencing being done. Is this understanding correct?

Can you suggest some documentation that would explain this in detail? I am interested in Intel Nehalem and Sandy Bridge microarchitectures.

EDIT

Limiting the process to a single core or cpu is not an option as the process is really huge(in terms of resources consumed) and would like to optimally utilize all the resources in the machine that includes all the cores and processors.

Edit

Thanks for the confirmation that the TSC is synced across cores and processors. But my original question is how is this synchronization done ? is it with some kind of fencing ? do you know of any public documentation ?

Conclusion

Thanks for all the inputs: Here's the conclusion for this discussion: The TSCs are synchronized at the initialization using a RESET that happens across the cores and processors in a multi processor/multi core system. And after that every Core is on their own. The TSCs are kept invariant with a Phase Locked Loop that would normalize the frequency variations and thus the clock variations within a given Core and that is how the TSC remain in sync across cores and processors.

568

asked Jun 06 '12 19:06

Jay D

2 Answers

Straight from Intel, here's an explanation of how recent processors maintain a TSC that ticks at a constant rate, is synchronous between cores and packages on a multi-socket motherboard, and may even continue ticking when the processor goes into a deep sleep C-state, in particular see the explanation by Vipin Kumar E K (Intel):

http://software.intel.com/en-us/articles/best-timing-function-for-measuring-ipp-api-timing/

Here's another reference from Intel discussing the synchronization of the TSC across cores, in this case they mention the fact that rdtscp allows you to read both the TSC and the processor id atomically, this is important in tracing applications... suppose you want to trace the execution of a thread that might migrate from one core to another, if you do that in two separate instructions (non-atomic) then you don't have certainty of which core the thread was in at the time it read the clock.

http://software.intel.com/en-us/articles/intel-gpa-tip-cannot-sychronize-cpu-timestamps/

All sockets/packages on a motherboard receive two external common signals:

RESET
Reference CLOCK

All sockets see RESET at the same time when you power the motherboard, all processor packages receive a reference clock signal from an external crystal oscillator and the internal clocks in the processor are kept in phase (although usually with a high multiplier, like 25x) with circuitry called a Phase Locked Loop (PLL). Recent processors will clock the TSC at the highest frequency (multiplier) that the processor is rated (so called constant TSC), regardless of the multiplier that any individual core may be using due to temperature or power management throttling (so called invariant TSC). Nehalem processors like the X5570 released in 2008 (and newer Intel processors) support a "Non-stop TSC" that will continue ticking even when conserving power in a deep power down C-state (C6). See this link for more information on the different power down states:

http://www.anandtech.com/show/2199

Upon further research I came across a patent Intel filed on 12/22/2009 and was published on 6/23/2011 entitled "Controlling Time Stamp Counter (TSC) Offsets For Mulitple Cores And Threads"

http://www.freepatentsonline.com/y2011/0154090.html

Google's page for this patent application (with link to USPTO page)

http://www.google.com/patents/US20110154090

From what I gather there is one TSC in the uncore (the logic in a package surrounding the cores but not part of any core) which is incremented on every external bus clock by the value in the field of the machine specific register specified by Vipin Kumar in the link above (MSR_PLATFORM_INFO[15:8]). The external bus clock runs at 133.33MHz. In addition each core has it's own TSC register, clocked by a clock domain that is shared by all cores and may be different from the clock for any one core - therefore there must be some kind of buffer when the core TSC is read by the RDTSC (or RDTSCP) instruction running in a core. For example, MSR_PLATFORM_INFO[15:8] may be set to 25 on a package, every bus clock the uncore TSC increments by 25, there is a PLL that multiplies the bus clock by 25 and provides this clock to each of the cores to clock their local TSC register, thereby keeping all TSC registers in synch. So to map the terminology to actual hardware

Constant TSC is implemented by using the external bus clock running at 133.33 MHz which is multiplied by a constant multiplier specified in MSR_PLATFORM_INFO[15:8]
Invariant TSC is implemented by keeping the TSC in each core on a separate clock domain
Non-stop TSC is implemented by having an uncore TSC that is incremented by MSR_PLATFORM_INFO[15:8] ticks on every bus clock, that way a multi-core package can go into deep power down (C6 state) and can shutdown the PLL... there is no need to keep a clock at the higher multiplier. When a core is resumed from C6 state its internal TSC will get initialized to the value of the uncore TSC (the one that didn't go to sleep) with an offset adjustment in case software has written a value to the TSC, the details of which are in the patent. If software does write to the TSC then the TSC for that core will be out of phase with other cores, but at a constant offset (the frequency of the TSC clocks are all tied to the bus reference clock by a constant multiplier).

142

answered Sep 19 '22 13:09

amdn

On newer CPUs (i7 Nehalem+ IIRC) the TSC is synchronzied across all cores and runs a constant rate. So for a single processor, or more than one processor on a single package or mainboard(!) you can rely on a synchronzied TSC.

From the Intel System Manual 16.12.1

The time stamp counter in newer processors may support an enhancement, referred to as invariant TSC. Processors support for invariant TSC is indicated by CPUID.80000007H:EDX[8]. The invariant TSC will run at a constant rate in all ACPI P-, C-. and T-states. This is the architectural behavior moving forward.

On older processors you can not rely on either constant rate or synchronziation.

Edit: At least on multiple processors in a single package or mainboard the invariant TSC is synchronized. The TSC is reset to zero at a /RESET and then ticks onward at a constant rate on each processor, without drift. The /RESET signal is guaranteed to arrive at each processor at the same time.

answered Sep 22 '22 13:09

Gunther Piez

Related questions
                            
                                Does a pointer point to the LSB or MSB?
                            
                                Moving a file on Linux in C
                            
                                What is an example in which knowing C will make me write better code in any other language?
                            
                                Send string in PUT request with libcurl
                            
                                implicit declaration using -std=c99
                            
                                How to use struct timeval to get the execution time?
                            
                                How does the Levenberg–Marquardt algorithm work in detail but in an understandable way?
                            
                                Rand Implementation
                            
                                understanding requirements for execve and setting environment vars
                            
                                C Assign Pointer to NULL
                            
                                Use of raw pointers in modern C++ post C++11 [closed]
                            
                                Why does adding to a pointer with += work, but pointer + 1 doesn't?
                            
                                how many times will strlen() be called in this for loop?
                            
                                Is object orientation bad for embedded systems, and why? [closed]
                            
                                Understand the assembly code generated by a simple C program
                            
                                What's the correct term for the '...' token?
                            
                                What's the difference in practice between inline and #define?
                            
                                What is the meaning of "%-*.*s" in a printf format string?
                            
                                is there a GCC compiler/linker option to change the name of main? [duplicate]
                            
                                Why are static variables auto-initialized to zero? [duplicate]

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

CPU TSC fetch operation especially in multicore-multi-processor environment

Tags:

c

assembly

multicore

cpu-registers

microprocessors

Jay D

People also ask

2 Answers

amdn

Gunther Piez

Recent Activity

Donate For Us