I want to write a bunch of optimizations for gcc using genetic algorithms.
I need to measure execution time of an assembly functions for some stats and fit functions.
The usual time measurement can't be used, 'cause it is influenced by the cache size.
So I need a table where I can see something like this.
command | operands | operands sizes | execution cycles
Am I missunderstanding something? Sorry for bad English.
Reciprocal throughput: The average number of core clock cycles per instruction for a series of independent instructions of the same kind in the same thread. For add this is listed as 0.25 meaning that up to 4 add instructions can execute every cycle (giving a reciprocal throughput of 1 / 4 = 0.25 ).
rax is the 64-bit, "long" size register. It was added in 2003 during the transition to 64-bit processors. eax is the 32-bit, "int" size register. It was added in 1985 during the transition to 32-bit processors with the 80386 CPU.
With modern CPU's, there are no simple tables to look up how long an instruction will take to complete (although such tables exist for some old processors, e.g. 486). Your best information on what each instruction does and how long it might take comes from the chip manufacturer. E.g. Intel's documentation manuals are quite good (there's also an optimisation manual on that page).
On pretty much all modern CPU's there's also the RDTSC
instruction that reads the time stamp counter for the processor on which the code is running into EDX:EAX
. There are pitfalls with this also, but essentially if the code you are profiling is representative of a real use situation, its execution doesn't get interrupted or shifted to another CPU core, then you can use this instruction to get the timings you want. I.e. surround the code you are optimising with two RDTSC
instructions and take the difference in TSC as the timing. (Variances on timings in different tests/situations can be great; statistics is your friend.)
reading the system clock value?
You can instrument your code using assembly (rdtsc and friends) or using a instrumentation API like PAPI. Accurately measuring clock cycles that were spent during the execution of one instruction is not possible, however - you can refer to your architecture developer manuals for the best estimates.
In both cases, you should be careful when taking into account effects from running on a SMP environment.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With