Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Passing value as a function argument vs calculating it twice?

I recall from Agner Fog's excellent guide that 64-bit Linux can pass 6 integer function parameters via registers:

http://www.agner.org/optimize/optimizing_cpp.pdf

(page 8)

I have the following function:

void x(signed int a, uint b, char c, unit d, uint e, signed short f);

and I need to pass an additional unsigned short parameter, which would make 7 in total. However, I can actually derive the value of the 7th from one of the existing 6.

So my question is which of the following is a better practice for performance:

  • Passing the already-calculated value as a 7th argument on 64-bit Linux
  • Not passing the already-calculated value, but calculating it again for a second time using one of the existing 6 arguments.

The operation in question is a simple bit-shift:

unsigned short g = c & 1;

Not fully understanding x86 assembler I am not too sure how precious registers are and whether it is better to recalculate a value as a local variable, than pass it through function calls as an argument?

My belief is that it would be better to calculate the value twice because it is such a simple 1 CPU cycle task.

EDIT I know I can just profile this- but I'd like to also understand what is happening under the hood with both approaches. Having a 7th argument does this mean cache/memory is involved, rather than registers?

like image 422
user997112 Avatar asked Feb 26 '14 17:02

user997112


2 Answers

The machine conventions to pass arguments is called the application binary interface (or ABI), and for Linux x86-64 is described in x86-64 ABI spec. See also x86 calling conventions wikipage.

In your case, it is probably not worthwhile to pass c & 1 as an additional parameter (since that 7th parameter is passed on stack).

Don't forget that current processor cores (on desktop or laptop computers) are often doing out-of-order execution and are superscalar, so the c & 1 operation could be done in parallel with other operations and might cost "nothing".

But leave such micro-optimizations to the compiler. If you care a lot about performance, use a recent GCC 4.8 compiler with gcc-4.8 -O3 -flto both for compiling and for linking (i.e. enable link-time optimization).

BTW, cache performance is much more relevant than such micro-optimizations. A single cache miss may take the same time (e.g. 250 nanoseconds) as hundreds of CPU machine instructions. Current CPUs are rumored to mostly wait for the caches. You might want to add a few explicit (and judicious) calls to __builtin_prefetch (see this question and this answer). But adding too much these prefetches would slow down your code.

At last, readability and maintainability of your code should matter much more than raw performance!

like image 112
Basile Starynkevitch Avatar answered Nov 12 '22 04:11

Basile Starynkevitch


Basile's answer is good, I'll just point out another thing to keep in mind:
a) The stack is very likely to be in L1 cache, so passing arguments on the stack should not take more than ~3 cycles extra.
b) The ABI (x86-64 System V, in this case) requires clobbered registers to be restored. Some are saved by the caller, others by the callee. Obviously, the registers used to pass arguments must be saved by the caller if the original contents were needed again. But when your function uses more registers than the caller saved, any additional temporary results the function needs to calculate must go into a callee-saved register. So the function ends up spilling a register on the stack, reusing the register for your temporary variable, and then pops the original value back.
The only way you can avoid accessing memory is by using a smaller, simpler function that needs fewer temporary variables.

like image 40
EOF Avatar answered Nov 12 '22 04:11

EOF