Structs and unions: which is better from a performance point of view? Passing the parameter by value or pointer?

Question

It's probably a silly question, but it makes me slightly quibble every time I want to "optimize" the passage of heavy arguments (such as structure for example) to a function that just reads them. I hesitate between passing a pointer:

struct Foo
{
    int x;
    int y;
    int z;
} Foo;

int sum(struct Foo *foo_struct)
{
    return foo_struct->x + foo_struct->y + foo_struct->z;
}

Or a constant:

struct Foo
{
    int x;
    int y;
    int z;
} Foo;

int sum(const struct Foo foo_struct)
{
    return foo_struct.x + foo_struct.y + foo_struct.z;
}

The pointers are intended not to copy the data but just to send its address, which costs almost nothing.

For constants, it probably varies between compilers or optimization levels, although I don't know how a constant pass is optimized; if it is, then the compiler probably does a better job than I do.

From a performance point of view only (even if it is negligible in my examples), what is the preferred way of doing things?

Marco Bonelli · Accepted Answer

Structs, much like arrays, are containers of data. Every time you work with a container, you will have its data layed out in a contiguous block of memory. The container itself is identified by its starting address, and every single time you operate with it, your program will need to do low level pointer arithmetic through dedicated instructions in order to apply an offset to get from the starting address to the desired field (or element in case of arrays). The only things that a compiler needs to know to work with a struct are (roughly):

Its starting address in memory.
The offset of each field.
The size of each field.

A compiler can optimize code working on structs in the same way if the struct is passed as pointer or not, and we'll see how in a moment. What's different though, it's how the struct is passed to each function.

First let me make one thing clear: the const qualifier is not useful to understand the difference between passing a structure as pointer or by value. It merely tells the compiler that inside the function the value of the parameter itself will not be modified. Performance difference between passing as value or as pointer is not affected in general by const. The const keyword only becomes useful for other kinds of optimizations, not this one.

The main difference between these two signatures:

void first(const struct mystruct x);
void second(struct mystruct *x);

is that the first function will expect the whole struct to be passed as parameter, which therefore means copying the whole structure on the stack right before calling the function. The second function however only needs a pointer to the structure, and therefore the argument can be passed as a single value on the stack, or in a register like it's usually done in x86-64.

Now, to better understand what happens, let's analyze the following program:

#include <stdio.h>

struct mystruct {
    unsigned a, b, c, d, e, f, g, h, i, j, k;
};

unsigned long __attribute__ ((noinline)) first(const struct mystruct x) {
    unsigned long total = x.a;
    total += x.b;
    total += x.c;
    total += x.d;
    total += x.e;
    total += x.f;
    total += x.g;
    total += x.h;
    total += x.i;
    total += x.j;
    total += x.k;

    return total;
}

unsigned long __attribute__ ((noinline)) second(struct mystruct *x) {
    unsigned long total = x->a;
    total += x->b;
    total += x->c;
    total += x->d;
    total += x->e;
    total += x->f;
    total += x->g;
    total += x->h;
    total += x->i;
    total += x->j;
    total += x->k;

    return total;
}

int main (void) {
    struct mystruct x = {0};
    scanf("%u", &x.a);

    unsigned long v = first(x);
    printf("%lu
", v);

    v = second(&x);
    printf("%lu
", v);

    return 0;
}

The __attribute__ ((noinline)) is just to avoid automatic inlining of the function, which for testing purposes is very simple and therefore will probably get inlined with -O3.

Let's now compile and disassemble the result with the help of objdump.

No optimizations

Let's first compile without optimizations and see what happens:

This is how main() calls first():

  86a:   48 89 e0                mov    rax,rsp
  86d:   48 8b 55 c0             mov    rdx,QWORD PTR [rbp-0x40]
  871:   48 89 10                mov    QWORD PTR [rax],rdx
  874:   48 8b 55 c8             mov    rdx,QWORD PTR [rbp-0x38]
  878:   48 89 50 08             mov    QWORD PTR [rax+0x8],rdx
  87c:   48 8b 55 d0             mov    rdx,QWORD PTR [rbp-0x30]
  880:   48 89 50 10             mov    QWORD PTR [rax+0x10],rdx
  884:   48 8b 55 d8             mov    rdx,QWORD PTR [rbp-0x28]
  888:   48 89 50 18             mov    QWORD PTR [rax+0x18],rdx
  88c:   48 8b 55 e0             mov    rdx,QWORD PTR [rbp-0x20]
  890:   48 89 50 20             mov    QWORD PTR [rax+0x20],rdx
  894:   8b 55 e8                mov    edx,DWORD PTR [rbp-0x18]
  897:   89 50 28                mov    DWORD PTR [rax+0x28],edx
  89a:   e8 81 fe ff ff          call   720 <first>

And this is the function itself:

 0000000000000720 <first>:
  720:   55                      push   rbp
  721:   48 89 e5                mov    rbp,rsp
  724:   8b 45 10                mov    eax,DWORD PTR [rbp+0x10]
  727:   89 c0                   mov    eax,eax
  729:   48 89 45 f8             mov    QWORD PTR [rbp-0x8],rax
  72d:   8b 45 14                mov    eax,DWORD PTR [rbp+0x14]
  730:   89 c0                   mov    eax,eax
  732:   48 01 45 f8             add    QWORD PTR [rbp-0x8],rax
  736:   8b 45 18                mov    eax,DWORD PTR [rbp+0x18]
  739:   89 c0                   mov    eax,eax
  ... same stuff happening over and over ...
  783:   48 01 45 f8             add    QWORD PTR [rbp-0x8],rax
  787:   48 8b 45 f8             mov    rax,QWORD PTR [rbp-0x8]
  78b:   5d                      pop    rbp
  78c:   c3                      ret

It's quite obvious that the whole structure is being copied on the stack before calling the function.

The function then takes each value in the struct looking at each value contained in the struct on the stack each time (DWORD PTR [rbp + offset]).

This is how main() calls second():

  8bf:   48 8d 45 c0             lea    rax,[rbp-0x40]
  8c3:   48 89 c7                mov    rdi,rax
  8c6:   e8 c2 fe ff ff          call   78d <second>

And this is the function itself:

 000000000000078d <second>:
  78d:   55                      push   rbp
  78e:   48 89 e5                mov    rbp,rsp
  791:   48 89 7d e8             mov    QWORD PTR [rbp-0x18],rdi
  795:   48 8b 45 e8             mov    rax,QWORD PTR [rbp-0x18]
  799:   8b 00                   mov    eax,DWORD PTR [rax]
  79b:   89 c0                   mov    eax,eax
  79d:   48 89 45 f8             mov    QWORD PTR [rbp-0x8],rax
  7a1:   48 8b 45 e8             mov    rax,QWORD PTR [rbp-0x18]
  7a5:   8b 40 04                mov    eax,DWORD PTR [rax+0x4]
  7a8:   89 c0                   mov    eax,eax
  ... same stuff happening over and over ...
  81f:   48 01 45 f8             add    QWORD PTR [rbp-0x8],rax
  823:   48 8b 45 f8             mov    rax,QWORD PTR [rbp-0x8]
  827:   5d                      pop    rbp
  828:   c3                      ret

You can see that the argument is passed as a pointer instead of being copied on the stack, which is only two very simple instructions (lea + mov). However, since now the function has to work with a pointer using the -> operator, we see that every single time a value in the struct needs to be accessed, memory needs to be dereferenced two times instead of one (first to get the pointer to the structure from the stack, then to get the value at the specified offset in the struct).

It may seem that there is no real difference between the two functions, since the linear number of instructions (linear in terms of struct members) that was required to load the struct on the stack in the first case is still required to dereference the pointer another time in the second case.

We are talking about optimization though, and it makes no sense to not optimize the code. Let's see what happens if we do.

With optimizations

In reality, when working with a struct, we don't really care where it is in memory (stack, heap, data segment, whatever). As long as we know where it starts, it all boils down to applying the same simple pointer arithmetic to access the fields. This always needs to be done, regardless of where the structure resides or whether it was dynamically allocated or not.

If we optimize the code above with -O3, we now see the following:

This is how main() calls first():

  61a:   48 83 ec 30             sub    rsp,0x30
  61e:   48 8b 44 24 30          mov    rax,QWORD PTR [rsp+0x30]
  623:   48 89 04 24             mov    QWORD PTR [rsp],rax
  627:   48 8b 44 24 38          mov    rax,QWORD PTR [rsp+0x38]
  62c:   48 89 44 24 08          mov    QWORD PTR [rsp+0x8],rax
  631:   48 8b 44 24 40          mov    rax,QWORD PTR [rsp+0x40]
  636:   48 89 44 24 10          mov    QWORD PTR [rsp+0x10],rax
  63b:   48 8b 44 24 48          mov    rax,QWORD PTR [rsp+0x48]
  640:   48 89 44 24 18          mov    QWORD PTR [rsp+0x18],rax
  645:   48 8b 44 24 50          mov    rax,QWORD PTR [rsp+0x50]
  64a:   48 89 44 24 20          mov    QWORD PTR [rsp+0x20],rax
  64f:   8b 44 24 58             mov    eax,DWORD PTR [rsp+0x58]
  653:   89 44 24 28             mov    DWORD PTR [rsp+0x28],eax
  657:   e8 74 01 00 00          call   7d0 <first>

And this is the function itself:

 00000000000007d0 <first>:
  7d0:   8b 44 24 0c             mov    eax,DWORD PTR [rsp+0xc]
  7d4:   8b 54 24 08             mov    edx,DWORD PTR [rsp+0x8]
  7d8:   48 01 c2                add    rdx,rax
  7db:   8b 44 24 10             mov    eax,DWORD PTR [rsp+0x10]
  7df:   48 01 d0                add    rax,rdx
  7e2:   8b 54 24 14             mov    edx,DWORD PTR [rsp+0x14]
  7e6:   48 01 d0                add    rax,rdx
  7e9:   8b 54 24 18             mov    edx,DWORD PTR [rsp+0x18]
  7ed:   48 01 c2                add    rdx,rax
  7f0:   8b 44 24 1c             mov    eax,DWORD PTR [rsp+0x1c]
  7f4:   48 01 c2                add    rdx,rax
  7f7:   8b 44 24 20             mov    eax,DWORD PTR [rsp+0x20]
  7fb:   48 01 d0                add    rax,rdx
  7fe:   8b 54 24 24             mov    edx,DWORD PTR [rsp+0x24]
  802:   48 01 d0                add    rax,rdx
  805:   8b 54 24 28             mov    edx,DWORD PTR [rsp+0x28]
  809:   48 01 c2                add    rdx,rax
  80c:   8b 44 24 2c             mov    eax,DWORD PTR [rsp+0x2c]
  810:   48 01 c2                add    rdx,rax
  813:   8b 44 24 30             mov    eax,DWORD PTR [rsp+0x30]
  817:   48 01 d0                add    rax,rdx
  81a:   c3                      ret

This is how main() calls second():

  671:   48 89 df                mov    rdi,rbx
  674:   e8 a7 01 00 00          call   820 <second>

And this is the function itself:

 0000000000000820 <second>:
  820:   8b 47 04                mov    eax,DWORD PTR [rdi+0x4]
  823:   8b 17                   mov    edx,DWORD PTR [rdi]
  825:   48 01 c2                add    rdx,rax
  828:   8b 47 08                mov    eax,DWORD PTR [rdi+0x8]
  82b:   48 01 d0                add    rax,rdx
  82e:   8b 57 0c                mov    edx,DWORD PTR [rdi+0xc]
  831:   48 01 d0                add    rax,rdx
  834:   8b 57 10                mov    edx,DWORD PTR [rdi+0x10]
  837:   48 01 c2                add    rdx,rax
  83a:   8b 47 14                mov    eax,DWORD PTR [rdi+0x14]
  83d:   48 01 c2                add    rdx,rax
  840:   8b 47 18                mov    eax,DWORD PTR [rdi+0x18]
  843:   48 01 d0                add    rax,rdx
  846:   8b 57 1c                mov    edx,DWORD PTR [rdi+0x1c]
  849:   48 01 d0                add    rax,rdx
  84c:   8b 57 20                mov    edx,DWORD PTR [rdi+0x20]
  84f:   48 01 c2                add    rdx,rax
  852:   8b 47 24                mov    eax,DWORD PTR [rdi+0x24]
  855:   48 01 c2                add    rdx,rax
  858:   8b 47 28                mov    eax,DWORD PTR [rdi+0x28]
  85b:   48 01 d0                add    rax,rdx
  85e:   c3                      ret

It should now be clear which code is better. The compiler successfully identified that all it needs in both cases is to know where the beginning of the structure is, and then it can just apply the same simple math to determine the position of each field. Whether the address is on the stack or somewhere else, it does not really matter.

In fact, in the first() case we see all fields being accessed through [rsp + offset], meaning that some address on the stack itself (rsp) is used to calculate the position of the fields, while in the second() case we see [rdi + offset], meaning that the address passed as parameter (in rdi) is used instead. The offsets though are still the same.

So what's the difference now between the two functions? In terms of function code itself, basically none. In terms of parameter passing, the first() function still needs the struct passed by value, and therefore even with optimizations enabled, the whole structure still needs to be copied on the stack, therefore we can see that the first() function is way heavier and adds a lot of code in the caller.

As I previously said, a compiler can optimize code working on structs in the same way if the struct is passed as pointer or not. However, as we just saw, the way the structure is passed makes a big difference in the caller.

One could argue that the const qualifier for the first() function could ring a bell for the compiler and make it understand that there is really no need to copy the data on the stack, and the caller could just pass a pointer. However the compiler should strictly adhere to the calling convention dictated by the ABI for a given signature, instead of going out of its way to optimize the code. After all, it's not really the compiler's fault in this case, but the programmer's fault.

So, to answer your question:

From a performance point of view only (even if it is negligible in my examples), what is the preferred way of doing things?

The preferred way is definitely to pass a pointer, and not the struct itself.

Structs and unions: which is better from a performance point of view? Passing the parameter by value or pointer?

Tags:

performance

c

pointers

constants

Foxy

1 Answers

No optimizations

With optimizations

Marco Bonelli

Recent Activity

Donate For Us

Structs and unions: which is better from a performance point of view? Passing the parameter by value or pointer?

Tags:

performance

c

pointers

constants

Foxy

1 Answers

No optimizations

With optimizations

Marco Bonelli

Related questions

Recent Activity

Donate For Us