When a C function has to return multiple values, there's a few ways to go about that.
Right now I'm interested in the relative efficiency of two of those methods:
a) bundle the values in a struct foo. Populate a local foo, and return that.
b) pass pointers to be populated.
(I'm working on some legacy code that has a mix of the two.)
For the purposes of this post:
Obviously inlining would make the question moot.
Can the different methods affect the compiler's ability to inline?
If not inlined, will there be a performance difference between the two methods?
Can placement of a pointer-to-return-val parameters in the function arguments have an effect? Either on the compiler's ability to inline, or on non-inlined performance?
Edited (a) for clarity.
On Linux / x86-64, a struct with exactly two words (e.g. two pointers or two intptr_t or two long-s) is returned in two registers. This is a lot faster than e.g. malloc-ing it, and might be faster than writing a two words struct allocated on the call stack by the caller (then it is likely to be in some fast CPU cache; remember that on recent processors a cache miss may take hundreds of nanoseconds, or the time needed for a hundred of register to register integer addition machine instructions)
But inlining a function is not always faster. You could also use partial evaluation techniques or C++ code generation.
With a recent GCC compiler, consider also compiling all C or C++ files and linking with link-time optimization (e.g. -flto -O2)
I think the question is: which is faster (assuming no inlining):
void fn(int *a, int *b, int *c) {
*a = ...;
*b = ...;
... etc.
}
vs.
void fn(struct foo *f) {
f->a = ...;
f->b = ...;
... etc.
}
In isolation, the struct variant will be faster, because it will not have to load the individual pointers from memory (on x86 you can only pass a few pointers in registers, and the rest will be spilled into stack).
However, the caller context also matters. If the caller looks like this:
int a; double d1; int b; double d2; int c; ...
struct foo f;
fn(&f);
a = f->a;
b = f->b;
... etc.
then the savings will be largely negated by the "unpack foo" code.
But if the caller looks like this:
struct foo f;
fn(&f);
if (f->a != 0) ...
int x = f->a + f->b;
... etc.
then the "unpack" code will not be present.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With