E.g. given
typedef struct A {
int a;
int b;
int c;
} A;
typedef struct B {
int d;
int e;
int f;
} B;
void f(B& b1, A& a2) {
b1.d = a2.a;
b1.e = a2.b;
b1.f = a2.c;
}
f
could be replaced by a memcpy
(especially if the structs had more fields).
Will both versions produce equivalent code?
What if the structure we copy to has fewer fields than A
? I.e.
typedef struct C {
int g;
int h;
} C;
void h(C& c1, A& a2) {
c1.g = a2.a;
c1.h = a2.b;
}
I am interested because I am generating code which includes struct copies like this, normally changing the order of fields, and I want to know if these cases should be treated specially.
C tag included because I expect behavior in C is the same (modulo pointers instead of references).
According to godbolt.org, x86-64 gcc 6.2 with -O2 produces
mov eax, DWORD PTR [rsi]
mov DWORD PTR [rdi], eax
mov eax, DWORD PTR [rsi+4]
mov DWORD PTR [rdi+4], eax
mov eax, DWORD PTR [rsi+8]
mov DWORD PTR [rdi+8], eax
for field-by-field copy,
mov rax, QWORD PTR [rsi]
mov QWORD PTR [rdi], rax
mov eax, DWORD PTR [rsi+8]
mov DWORD PTR [rdi+8], eax
for memcpy
. Both clang and icc have similar differences. A bit disappointing.
Your testcase does not load and store enough memory for a conversion to memcpy to be worthwhile. Using twice as many members:
typedef struct A { int a, b, c, p, q, r; } A;
typedef struct B { int d, e, f, s, t, u; } B;
void f(B& b1, A& a2) {
b1.d = a2.a;
b1.e = a2.b;
b1.f = a2.c;
b1.s = a2.p;
b1.t = a2.q;
b1.u = a2.r;
}
... LLVM optimizes the code to:
f(B&, A&): # @f(B&, A&)
movups (%rsi), %xmm0
movups %xmm0, (%rdi)
movl 16(%rsi), %eax
movl %eax, 16(%rdi)
movl 20(%rsi), %eax
movl %eax, 20(%rdi)
retq
... with an unaligned 16-byte load/store copying the first four members.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With