Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Efficiently returning multiple values in C

When a C function has to return multiple values, there's a few ways to go about that.

Right now I'm interested in the relative efficiency of two of those methods:
a) bundle the values in a struct foo. Populate a local foo, and return that.
b) pass pointers to be populated.

(I'm working on some legacy code that has a mix of the two.)

For the purposes of this post:

  • All returned values are primitives. Int's, pointer values, etc. So sizeof(foo) is very small.
  • Making struct foo opaque isn't a concern.
  • The functions in question have at most 12 parameters, including any ptr-to-return-value parameters.
  • Assume a somewhat modern compiler, e.g. gcc 11 or later.

Obviously inlining would make the question moot.
Can the different methods affect the compiler's ability to inline?
If not inlined, will there be a performance difference between the two methods?

Can placement of a pointer-to-return-val parameters in the function arguments have an effect? Either on the compiler's ability to inline, or on non-inlined performance?

Edited (a) for clarity.

like image 868
Underhill Avatar asked Oct 20 '25 03:10

Underhill


2 Answers

This is ABI specific.

On Linux / x86-64, a struct with exactly two words (e.g. two pointers or two intptr_t or two long-s) is returned in two registers. This is a lot faster than e.g. malloc-ing it, and might be faster than writing a two words struct allocated on the call stack by the caller (then it is likely to be in some fast CPU cache; remember that on recent processors a cache miss may take hundreds of nanoseconds, or the time needed for a hundred of register to register integer addition machine instructions)

But inlining a function is not always faster. You could also use partial evaluation techniques or C++ code generation.

With a recent GCC compiler, consider also compiling all C or C++ files and linking with link-time optimization (e.g. -flto -O2)

like image 54
Basile Starynkevitch Avatar answered Oct 22 '25 16:10

Basile Starynkevitch


I think the question is: which is faster (assuming no inlining):

void fn(int *a, int *b, int *c) {
  *a = ...;
  *b = ...;
  ... etc.
}

vs.

void fn(struct foo *f) {
  f->a = ...;
  f->b = ...;
  ... etc.
}

In isolation, the struct variant will be faster, because it will not have to load the individual pointers from memory (on x86 you can only pass a few pointers in registers, and the rest will be spilled into stack).

However, the caller context also matters. If the caller looks like this:

int a; double d1; int b; double d2; int c; ...
struct foo f;
fn(&f);
a = f->a;
b = f->b;
... etc.

then the savings will be largely negated by the "unpack foo" code.

But if the caller looks like this:

struct foo f;
fn(&f);
if (f->a != 0) ...
int x = f->a + f->b;
... etc.

then the "unpack" code will not be present.

like image 38
Employed Russian Avatar answered Oct 22 '25 18:10

Employed Russian



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!