When passing arguments to a function I always assumed that passing arguments one by one is not different from passing them wrapped in an array or a struct or a tuple. However, a simple experiment showed that I was wrong.
The following program when compiled with GCC:
int test(int a, int b, int c, int d) {
return a + b + c + d;
}
int test(std::array<int, 4> arr) {
return arr[0] + arr[1] + arr[2] + arr[3];
}
struct abcd {
int a; int b; int c; int d;
};
int test(abcd s) {
return s.a + s.b + s.c + s.d;
}
int test(std::tuple<int, int, int, int> tup) {
return std::get<0>(tup) + std::get<1>(tup) + std::get<2>(tup) + std::get<3>(tup);
}
...produces a variety of assembly outputs:
impl_test(int, int, int, int):
lea eax, [rdi+rsi]
add eax, edx
add eax, ecx
ret
impl_test(std::array<int, 4ul>):
mov rax, rdi
sar rax, 32
add eax, edi
add eax, esi
sar rsi, 32
add eax, esi
ret
impl_test(abcd):
mov rax, rdi
sar rax, 32
add eax, edi
add eax, esi
sar rsi, 32
add eax, esi
ret
impl_test(std::tuple<int, int, int, int>):
mov eax, DWORD PTR [rdi+8]
add eax, DWORD PTR [rdi+12]
add eax, DWORD PTR [rdi+4]
add eax, DWORD PTR [rdi]
ret
main:
push rbp
push rbx
mov ecx, 4
mov edx, 3
movabs rbp, 8589934592
mov esi, 2
sub rsp, 24
mov edi, 1
movabs rbx, 17179869184
call int test<int, int, int, int>(int, int, int, int)
mov rdi, rbp
mov rsi, rbx
or rbx, 3
or rdi, 1
or rsi, 3
call int test<std::array<int, 4ul> >(std::array<int, 4ul>)
mov rdi, rbp
mov rsi, rbx
or rdi, 1
call int test<abcd>(abcd)
mov rdi, rsp
mov DWORD PTR [rsp], 4
mov DWORD PTR [rsp+4], 3
mov DWORD PTR [rsp+8], 2
mov DWORD PTR [rsp+12], 1
call int test<std::tuple<int, int, int, int> >(std::tuple<int, int, int, int>)
add rsp, 24
xor eax, eax
pop rbx
pop rbp
ret
Why is there a difference?
When a function is called (that is, not inlined, constexpr
evaluated or eliminated), the way arguments are passed depends on many factors including:
Let's get back to the example you provided. You compiled the code with -02
so dead code won't be eliminated and function inlining is disabled. So all functions have to be called. The target platform is x64.
The first function has four 4-byte integer parameters. Therefore, all of them are passed through registers.
The second function has one fixed-size array of four 4-byte integers. The compiler decided to use two registers (rdi
and rsi
) to pass the four integers where rdi
= 0x200000001 and rsi
= 0x400000003. Notice how the four integers (1, 2, 3, 4) are compactly passed using these two registers.
Passing the integers as a structure rather then one by one made the compiler use different techniques to pass them. But there is a trade off here between the size of code, speed and number of registers required.
The same thing goes for the third function.
The last function, however, contains calls to std::get
which require the address of the passed tuple. So the address is stored in rdi
to be used by the std::get
function. Since you're compiling with C++14, std::get is marked with constexpr
. The compiler was able to evaluate the function and therefore the memory access has been emitted in test function rather than emitting a call to the std::get
function. Notice that this is different from inlining.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With