Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Do GCC and Clang optimize field-by-field struct copy?

Tags:

c++

c

gcc

struct

clang

E.g. given

typedef struct A {
    int a;
    int b;
    int c;
} A;

typedef struct B {
    int d;
    int e;
    int f;
} B;

void f(B& b1, A& a2) {
    b1.d = a2.a;
    b1.e = a2.b;
    b1.f = a2.c;
}

f could be replaced by a memcpy (especially if the structs had more fields).

  1. Will both versions produce equivalent code?

  2. What if the structure we copy to has fewer fields than A? I.e.

    typedef struct C {
        int g;
        int h;
    } C;
    
    void h(C& c1, A& a2) {
        c1.g = a2.a;
        c1.h = a2.b;
    }
    

I am interested because I am generating code which includes struct copies like this, normally changing the order of fields, and I want to know if these cases should be treated specially.

C tag included because I expect behavior in C is the same (modulo pointers instead of references).

like image 556
Alexey Romanov Avatar asked Dec 09 '16 13:12

Alexey Romanov


Video Answer


2 Answers

According to godbolt.org, x86-64 gcc 6.2 with -O2 produces

mov eax, DWORD PTR [rsi]
mov DWORD PTR [rdi], eax
mov eax, DWORD PTR [rsi+4]
mov DWORD PTR [rdi+4], eax
mov eax, DWORD PTR [rsi+8]
mov DWORD PTR [rdi+8], eax

for field-by-field copy,

mov rax, QWORD PTR [rsi]
mov QWORD PTR [rdi], rax
mov eax, DWORD PTR [rsi+8]
mov DWORD PTR [rdi+8], eax

for memcpy. Both clang and icc have similar differences. A bit disappointing.

like image 83
Alexey Romanov Avatar answered Oct 02 '22 01:10

Alexey Romanov


Your testcase does not load and store enough memory for a conversion to memcpy to be worthwhile. Using twice as many members:

typedef struct A { int a, b, c, p, q, r; } A;
typedef struct B { int d, e, f, s, t, u; } B;
void f(B& b1, A& a2) {
  b1.d = a2.a;
  b1.e = a2.b;
  b1.f = a2.c;
  b1.s = a2.p;
  b1.t = a2.q;
  b1.u = a2.r;
}

... LLVM optimizes the code to:

f(B&, A&):                             # @f(B&, A&)
        movups  (%rsi), %xmm0
        movups  %xmm0, (%rdi)
        movl    16(%rsi), %eax
        movl    %eax, 16(%rdi)
        movl    20(%rsi), %eax
        movl    %eax, 20(%rdi)
        retq

... with an unaligned 16-byte load/store copying the first four members.

like image 38
Richard Smith Avatar answered Oct 01 '22 23:10

Richard Smith