I want to get the CPU cycles at a specific point. I use this function at that point: <pre class="prettyprint"><code>static __inline__ unsigned long long rdtsc(void) { unsigned long long int x; __asm__ volatile (".byte 0x0f, 0x31" : "=A" (x)); // broken for 64-bit builds; don't copy this code return x; } </code></pre> (editor's note: <code>"=A"</code> is wrong for x86-64; it picks either RDX or RAX. Only in 32-bit mode will it pick the EDX:EAX output you want. See How to get the CPU cycle count in x86_64 from C++?.) The problem is that it returns always an increasing number (in every run). It's as if it is referring to the absolute time. Am I using the functions incorrectly?

As long as your thread stays on the same CPU core, the RDTSC instruction will keep returning an increasing number until it wraps around. For a 2GHz CPU, this happens after 292 years, so it is not a real issue. You probably won't see it happen. If you expect to live that long, make sure your computer reboots, say, every 50 years. The problem with RDTSC is that you have no guarantee that it starts at the same point in time on all cores of an elderly multicore CPU and no guarantee that it starts at the same point in time time on all CPUs on an elderly multi-CPU board. Modern systems usually do not have such problems, but the problem can also be worked around on older systems by setting a thread's affinity so it only runs on one CPU. This is not good for application performance, so one should not generally do it, but for measuring ticks, it's just fine. (Another "problem" is that many people use RDTSC for measuring time, which is not what it does, but you wrote that you want CPU cycles, so that is fine. If you do use RDTSC to measure time, you may have surprises when power saving or hyperboost or whatever the multitude of frequency-changing techniques are called kicks in. For actual time, the <code>clock_gettime</code> syscall is surprisingly good under Linux.) I would just write <code>rdtsc</code> inside the <code>asm</code> statement, which works just fine for me and is more readable than some obscure hex code. Assuming it's the correct hex code (and since it neither crashes and returns an ever-increasing number, it seems so), your code is good. If you want to measure the number of ticks a piece of code takes, you want a tick difference, you just need to subtract two values of the ever-increasing counter. Something like <code>uint64_t t0 = rdtsc(); ... uint64_t t1 = rdtsc() - t0;</code> Note that for if very accurate measurements isolated from surrounding code are necessary, you need to serialize, that is stall the pipeline, prior to calling <code>rdtsc</code> (or use <code>rdtscp</code> which is only supported on newer processors). The one serializing instruction that can be used at every privilegue level is <code>cpuid</code>. In reply to the further question in the comment: The TSC starts at zero when you turn on the computer (and the BIOS resets all counters on all CPUs to the same value, though some BIOSes a few years ago did not do so reliably). Thus, from your program's point of view, the counter started "some unknown time in the past", and it always increases with every clock tick the CPU sees. Therefore if you execute the instruction returning that counter now and any time later in a different process, it will return a greater value (unless the CPU was suspended or turned off in between). Different runs of the same program get bigger numbers, because the counter keeps growing. Always. Now, <code>clock_gettime(CLOCK_PROCESS_CPUTIME_ID)</code> is a different matter. This is the CPU time that the OS has given to the process. It starts at zero when your process starts. A new process starts at zero, too. Thus, two processes running after each other will get very similar or identical numbers, not ever growing ones. <code>clock_gettime(CLOCK_MONOTONIC_RAW)</code> is closer to how RDTSC works (and on some older systems is implemented with it). It returns a value that ever increases. Nowadays, this is typically a HPET. However, this is really time, and not ticks. If your computer goes into low power state (e.g. running at 1/2 normal frequency), it will still advance at the same pace.

Getting cpu cycles using RDTSC - why does the value of RDTSC always increase?

Tags:

linux

x86

assembly

cpu-usage

rdtsc

I want to get the CPU cycles at a specific point. I use this function at that point:

static __inline__ unsigned long long rdtsc(void)
{
    unsigned long long int x;
    __asm__ volatile (".byte 0x0f, 0x31" : "=A" (x));
    // broken for 64-bit builds; don't copy this code
    return x;
}

(editor's note: "=A" is wrong for x86-64; it picks either RDX or RAX. Only in 32-bit mode will it pick the EDX:EAX output you want. See How to get the CPU cycle count in x86_64 from C++?.)

The problem is that it returns always an increasing number (in every run). It's as if it is referring to the absolute time.

Am I using the functions incorrectly?

701

asked Dec 22 '11 10:12

user1106106

3 Answers

As long as your thread stays on the same CPU core, the RDTSC instruction will keep returning an increasing number until it wraps around. For a 2GHz CPU, this happens after 292 years, so it is not a real issue. You probably won't see it happen. If you expect to live that long, make sure your computer reboots, say, every 50 years.

The problem with RDTSC is that you have no guarantee that it starts at the same point in time on all cores of an elderly multicore CPU and no guarantee that it starts at the same point in time time on all CPUs on an elderly multi-CPU board.
Modern systems usually do not have such problems, but the problem can also be worked around on older systems by setting a thread's affinity so it only runs on one CPU. This is not good for application performance, so one should not generally do it, but for measuring ticks, it's just fine.

(Another "problem" is that many people use RDTSC for measuring time, which is not what it does, but you wrote that you want CPU cycles, so that is fine. If you do use RDTSC to measure time, you may have surprises when power saving or hyperboost or whatever the multitude of frequency-changing techniques are called kicks in. For actual time, the clock_gettime syscall is surprisingly good under Linux.)

I would just write rdtsc inside the asm statement, which works just fine for me and is more readable than some obscure hex code. Assuming it's the correct hex code (and since it neither crashes and returns an ever-increasing number, it seems so), your code is good.

If you want to measure the number of ticks a piece of code takes, you want a tick difference, you just need to subtract two values of the ever-increasing counter. Something like uint64_t t0 = rdtsc(); ... uint64_t t1 = rdtsc() - t0;
Note that for if very accurate measurements isolated from surrounding code are necessary, you need to serialize, that is stall the pipeline, prior to calling rdtsc (or use rdtscp which is only supported on newer processors). The one serializing instruction that can be used at every privilegue level is cpuid.

In reply to the further question in the comment:

The TSC starts at zero when you turn on the computer (and the BIOS resets all counters on all CPUs to the same value, though some BIOSes a few years ago did not do so reliably).

Thus, from your program's point of view, the counter started "some unknown time in the past", and it always increases with every clock tick the CPU sees. Therefore if you execute the instruction returning that counter now and any time later in a different process, it will return a greater value (unless the CPU was suspended or turned off in between). Different runs of the same program get bigger numbers, because the counter keeps growing. Always.

Now, clock_gettime(CLOCK_PROCESS_CPUTIME_ID) is a different matter. This is the CPU time that the OS has given to the process. It starts at zero when your process starts. A new process starts at zero, too. Thus, two processes running after each other will get very similar or identical numbers, not ever growing ones.

clock_gettime(CLOCK_MONOTONIC_RAW) is closer to how RDTSC works (and on some older systems is implemented with it). It returns a value that ever increases. Nowadays, this is typically a HPET. However, this is really time, and not ticks. If your computer goes into low power state (e.g. running at 1/2 normal frequency), it will still advance at the same pace.

143

answered Oct 09 '22 08:10

Damon

There's lots of confusing and/or wrong information about the TSC out there, so I thought I'd try to clear some of it up.

When Intel first introduced the TSC (in original Pentium CPUs) it was clearly documented to count cycles (and not time). However, back then CPUs mostly ran at a fixed frequency, so some people ignored the documented behaviour and used it to measure time instead (most notably, Linux kernel developers). Their code broke in later CPUs that don't run at a fixed frequency (due to power management, etc). Around that time other CPU manufacturers (AMD, Cyrix, Transmeta, etc) were confused and some implemented TSC to measure cycles and some implemented it so it measured time, and some made it configurable (via. an MSR).

Then "multi-chip" systems became more common for servers; and even later multi-core was introduced. This led to minor differences between TSC values on different cores (due to different startup times); but more importantly it also led to major differences between TSC values on different CPUs caused by CPUs running at different speeds (due to power management and/or other factors).

People that were trying to use it wrong from the start (people who used it to measure time and not cycles) complained a lot, and eventually convinced CPU manufacturers to standardise on making the TSC measure time and not cycles.

Of course this was a mess - e.g. it takes a lot of code just to determine what the TSC actually measures if you support all 80x86 CPUs; and different power management technologies (including things like SpeedStep, but also things like sleep states) may effect TSC in different ways on different CPUs; so AMD introduced a "TSC invariant" flag in CPUID to tell the OS that the TSC can be used to measure time correctly.

All recent Intel and AMD CPUs have been like this for a while now - TSC counts time and doesn't measure cycles at all. This means if you want to measure cycles you had to use (model specific) performance monitoring counters. Unfortunately the performance monitoring counters are an even worse mess (due to their model specific nature and convoluted configuration).

answered Oct 09 '22 09:10

Brendan

good answers already, and Damon already mentioned this in a way in his answer, but I'll add this from the actual x86 manual (volume 2, 4-301) entry for RDTSC:

Loads the current value of the processor's time-stamp counter (a 64-bit MSR) into the EDX:EAX registers. The EDX register is loaded with the high-order 32 bits of the MSR and the EAX register is loaded with the low-order 32 bits. (On processors that support the Intel 64 architecture, the high-order 32 bits of each of RAX and RDX are cleared.)

The processor monotonically increments the time-stamp counter MSR every clock cycle and resets it to 0 whenever the processor is reset. See "Time Stamp Counter" in Chapter 17 of the Intel® 64 and IA-32 Architectures Software Developer's Manual, Volume 3B, for specific details of the time stamp counter behavior.

answered Oct 09 '22 09:10

galois

Related questions
                            
                                Cannot use mkdir in home directory: permission denied (Linux Lubuntu) [closed]
                            
                                chmod - protect users' file being accessed so only owner can access?
                            
                                why am I getting Exec format error when I am writing my linux service?
                            
                                Make a Linux "GUI" in the command line
                            
                                Download YouTube videos with wget
                            
                                Could not find package laravel-laravel with stability stable
                            
                                Launching an app in heroku? What is procfile? 'web:' command?
                            
                                How to show the disk usage of each subdirectory in Linux?
                            
                                Is there an OS command I can run to determine if running inside a Xen based virtual machine
                            
                                Remove all files that does not have the following extensions in Linux
                            
                                Determine Redhat Linux Version
                            
                                difference between $@ and $* in bash script [duplicate]
                            
                                Is it possible to modify an element in the DOM with Puppeteer before creating a screenshot?
                            
                                sort logfile by timestamp on linux command line
                            
                                Sort a tab delimited file based on column sort command bash [duplicate]
                            
                                WKHTMLTOPDF Installation error on Ubuntu
                            
                                Linux - check if there is an empty line at the end of a file [duplicate]
                            
                                Shell shift procedure - What is this?
                            
                                How do I make Subversion use a third-party diff tool?
                            
                                1ms resolution timer under linux recommended way

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With