Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Why does std::tuple break small-size struct calling convention optimization in C++?

C++ has a small-size struct calling convention optimization where the compiler passes a small-size struct in function parameters as efficiently as it passes a primitive type (say, via registers). For example:

class MyInt { int n; public: MyInt(int x) : n(x){} }; void foo(int); void foo(MyInt); void bar1() { foo(1); } void bar2() { foo(MyInt(1)); } 

bar1() and bar2() generate almost identical assembly code except for calling foo(int) and foo(MyInt) respectively. Specifically on x86_64, it looks like:

        mov     edi, 1         jmp     foo(MyInt) ;tail-call optimization jmp instead of call ret 

But if we test std::tuple<int>, it will be different:

void foo(std::tuple<int>); void bar3() { foo(std::tuple<int>(1)); }  struct MyIntTuple : std::tuple<int> { using std::tuple<int>::tuple; }; void foo(MyIntTuple); void bar4() { foo(MyIntTuple(1)); } 

The generated assembly code looks totally different, the small-size struct (std::tuple<int>) is passed by pointer:

        sub     rsp, 24         lea     rdi, [rsp+12]         mov     DWORD PTR [rsp+12], 1         call    foo(std::tuple<int>)         add     rsp, 24         ret 

I dug a bit deeper, tried to make my int a bit more dirty (This should be close to an incomplete naive tuple impl):

class Empty {}; class MyDirtyInt : protected Empty, MyInt {public: using MyInt::MyInt; }; void foo(MyDirtyInt); void bar5() { foo(MyDirtyInt(1)); } 

but the calling convention optimization is applied:

        mov     edi, 1         jmp     foo(MyDirtyInt) 

I have tried GCC/Clang/MSVC, and they all showed the same behavior. (Godbolt link here) So I guess this must be something in the C++ standard? (I believe the C++ standard doesn't specify any ABI constraint, though?)

I'm aware that the compiler should be able to optimize these out, as long as the definition of foo(std::tuple<int>) is visible and not marked noinline. But I want to know which part of the standard or implementation causes the invalidation of this optimization.

FYI, in case you're curious about what I'm doing with std::tuple, I want to create a wrapper class (i.e. the strong typedef) and don't want to declare comparison operators (operator<==>'s prior to C++20) myself and don't want to bother with Boost, so I thought std::tuple was a good base class because everything was there.

like image 583
YumeYao Avatar asked Sep 03 '20 07:09

YumeYao


2 Answers

It seems to be a matter of ABI. For instance, the Itanium C++ ABI reads:

If the parameter type is non-trivial for the purposes of calls, the caller must allocate space for a temporary and pass that temporary by reference.

And, further:

A type is considered non-trivial for the purposes of calls if it has a non-trivial copy constructor, move constructor, or destructor, or all of its copy and move constructors are deleted.

The same requirement is in AMD64 ABI Draft 1.0.

For instance, in libstdc++, std::tuple has non-trivial move constructor: https://godbolt.org/z/4j8vds. The Standard prescribes both copy and move constructor as defaulted, which is satisfied here. However, at the same time, tuple inherits from _Tuple_impl and _Tuple_impl has a user-defined move constructor. Consequenlty, move constructor of tuple itself cannot be trivial.

On the contrary, in libc++, both copy and move constructors of std::tuple<int> are trivial. Therefore, the argument is passed in a register there: https://godbolt.org/z/WcTjM9.

As for Microsoft STL, std::tuple<int> is trivially neither copy-constructible nor move-constructible. It even seems to break the C++ Standard rules. std::tuple is defined recursively and, at the end of recursion, std::tuple<> specialization defines non-defaulted copy constructor. There is a comment about this issue: // TRANSITION, ABI: should be defaulted. Since tuple<> has no move constructor, both copy and move constructors of tuple<class...> are non-trivial.

like image 142
Daniel Langr Avatar answered Oct 24 '22 00:10

Daniel Langr


As suggested by @StoryTeller it might be related to a user defined move constructor inside std::tuple that causes this behavior.

See for example: https://godbolt.org/z/3M9KWo

Having user defined move constructor leads to the non-optimized assembly:

bar_my_tuple():         sub     rsp, 24         lea     rdi, [rsp+12]         mov     DWORD PTR [rsp+12], 1         call    foo(MyTuple<int>)         add     rsp, 24         ret 

In libcxx for example the copy and move constructors are declared as default both for tuple_leaf and for tuple, and you get the small-size struct call convention optimization for std::tuple<int> but not for std::tuple<std::string> which is holding a non trivially moveable member and thus becomes naturally non trivially moveable by itself.

like image 22
Amir Kirsh Avatar answered Oct 24 '22 02:10

Amir Kirsh