Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to measure x86 and x86-64 assembly commands execution time in processor cycles? [duplicate]

I want to write a bunch of optimizations for gcc using genetic algorithms. I need to measure execution time of an assembly functions for some stats and fit functions. The usual time measurement can't be used, 'cause it is influenced by the cache size.
So I need a table where I can see something like this.

command | operands | operands sizes | execution cycles

Am I missunderstanding something? Sorry for bad English.

like image 521
eox425 Avatar asked Jul 15 '10 10:07

eox425


People also ask

How many CPU cycles does an add instruction take?

Reciprocal throughput: The average number of core clock cycles per instruction for a series of independent instructions of the same kind in the same thread. For add this is listed as 0.25 meaning that up to 4 add instructions can execute every cycle (giving a reciprocal throughput of 1 / 4 = 0.25 ).

What is the rax register?

rax is the 64-bit, "long" size register. It was added in 2003 during the transition to 64-bit processors. eax is the 32-bit, "int" size register. It was added in 1985 during the transition to 32-bit processors with the 80386 CPU.


3 Answers

With modern CPU's, there are no simple tables to look up how long an instruction will take to complete (although such tables exist for some old processors, e.g. 486). Your best information on what each instruction does and how long it might take comes from the chip manufacturer. E.g. Intel's documentation manuals are quite good (there's also an optimisation manual on that page).

On pretty much all modern CPU's there's also the RDTSC instruction that reads the time stamp counter for the processor on which the code is running into EDX:EAX. There are pitfalls with this also, but essentially if the code you are profiling is representative of a real use situation, its execution doesn't get interrupted or shifted to another CPU core, then you can use this instruction to get the timings you want. I.e. surround the code you are optimising with two RDTSC instructions and take the difference in TSC as the timing. (Variances on timings in different tests/situations can be great; statistics is your friend.)

like image 152
PhiS Avatar answered Oct 03 '22 14:10

PhiS


reading the system clock value?

like image 23
Quonux Avatar answered Oct 03 '22 15:10

Quonux


You can instrument your code using assembly (rdtsc and friends) or using a instrumentation API like PAPI. Accurately measuring clock cycles that were spent during the execution of one instruction is not possible, however - you can refer to your architecture developer manuals for the best estimates.

In both cases, you should be careful when taking into account effects from running on a SMP environment.

like image 23
Michael Foukarakis Avatar answered Oct 03 '22 13:10

Michael Foukarakis