C++ has a small-size struct calling convention optimization where the compiler passes a small-size struct in function parameters as efficiently as it passes a primitive type (say, via registers). For example:
class MyInt { int n; public: MyInt(int x) : n(x){} }; void foo(int); void foo(MyInt); void bar1() { foo(1); } void bar2() { foo(MyInt(1)); }
bar1()
and bar2()
generate almost identical assembly code except for calling foo(int)
and foo(MyInt)
respectively. Specifically on x86_64, it looks like:
mov edi, 1 jmp foo(MyInt) ;tail-call optimization jmp instead of call ret
But if we test std::tuple<int>
, it will be different:
void foo(std::tuple<int>); void bar3() { foo(std::tuple<int>(1)); } struct MyIntTuple : std::tuple<int> { using std::tuple<int>::tuple; }; void foo(MyIntTuple); void bar4() { foo(MyIntTuple(1)); }
The generated assembly code looks totally different, the small-size struct (std::tuple<int>
) is passed by pointer:
sub rsp, 24 lea rdi, [rsp+12] mov DWORD PTR [rsp+12], 1 call foo(std::tuple<int>) add rsp, 24 ret
I dug a bit deeper, tried to make my int a bit more dirty (This should be close to an incomplete naive tuple impl):
class Empty {}; class MyDirtyInt : protected Empty, MyInt {public: using MyInt::MyInt; }; void foo(MyDirtyInt); void bar5() { foo(MyDirtyInt(1)); }
but the calling convention optimization is applied:
mov edi, 1 jmp foo(MyDirtyInt)
I have tried GCC/Clang/MSVC, and they all showed the same behavior. (Godbolt link here) So I guess this must be something in the C++ standard? (I believe the C++ standard doesn't specify any ABI constraint, though?)
I'm aware that the compiler should be able to optimize these out, as long as the definition of foo(std::tuple<int>)
is visible and not marked noinline. But I want to know which part of the standard or implementation causes the invalidation of this optimization.
FYI, in case you're curious about what I'm doing with std::tuple
, I want to create a wrapper class (i.e. the strong typedef) and don't want to declare comparison operators (operator<==>'s prior to C++20) myself and don't want to bother with Boost, so I thought std::tuple
was a good base class because everything was there.
It seems to be a matter of ABI. For instance, the Itanium C++ ABI reads:
If the parameter type is non-trivial for the purposes of calls, the caller must allocate space for a temporary and pass that temporary by reference.
And, further:
A type is considered non-trivial for the purposes of calls if it has a non-trivial copy constructor, move constructor, or destructor, or all of its copy and move constructors are deleted.
The same requirement is in AMD64 ABI Draft 1.0.
For instance, in libstdc++, std::tuple
has non-trivial move constructor: https://godbolt.org/z/4j8vds. The Standard prescribes both copy and move constructor as defaulted, which is satisfied here. However, at the same time, tuple
inherits from _Tuple_impl
and _Tuple_impl
has a user-defined move constructor. Consequenlty, move constructor of tuple
itself cannot be trivial.
On the contrary, in libc++, both copy and move constructors of std::tuple<int>
are trivial. Therefore, the argument is passed in a register there: https://godbolt.org/z/WcTjM9.
As for Microsoft STL, std::tuple<int>
is trivially neither copy-constructible nor move-constructible. It even seems to break the C++ Standard rules. std::tuple
is defined recursively and, at the end of recursion, std::tuple<>
specialization defines non-defaulted copy constructor. There is a comment about this issue: // TRANSITION, ABI: should be defaulted
. Since tuple<>
has no move constructor, both copy and move constructors of tuple<class...>
are non-trivial.
As suggested by @StoryTeller it might be related to a user defined move constructor inside std::tuple
that causes this behavior.
See for example: https://godbolt.org/z/3M9KWo
Having user defined move constructor leads to the non-optimized assembly:
bar_my_tuple(): sub rsp, 24 lea rdi, [rsp+12] mov DWORD PTR [rsp+12], 1 call foo(MyTuple<int>) add rsp, 24 ret
In libcxx for example the copy and move constructors are declared as default both for tuple_leaf
and for tuple
, and you get the small-size struct call convention optimization for std::tuple<int>
but not for std::tuple<std::string>
which is holding a non trivially moveable member and thus becomes naturally non trivially moveable by itself.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With