When I compile following code with gcc 6 -O3 -std=c++14
, I get nice and empty main
:
Dump of assembler code for function main():
0x00000000004003e0 <+0>: xor %eax,%eax
0x00000000004003e2 <+2>: retq
But uncommenting last line in main "breaks" optimization:
Dump of assembler code for function main():
0x00000000004005f0 <+0>: sub $0x78,%rsp
0x00000000004005f4 <+4>: lea 0x40(%rsp),%rdi
0x00000000004005f9 <+9>: movq $0x400838,0x10(%rsp)
0x0000000000400602 <+18>: movb $0x0,0x18(%rsp)
0x0000000000400607 <+23>: mov %fs:0x28,%rax
0x0000000000400610 <+32>: mov %rax,0x68(%rsp)
0x0000000000400615 <+37>: xor %eax,%eax
0x0000000000400617 <+39>: movl $0x0,(%rsp)
0x000000000040061e <+46>: movq $0x400838,0x30(%rsp)
0x0000000000400627 <+55>: movb $0x0,0x38(%rsp)
0x000000000040062c <+60>: movl $0x0,0x20(%rsp)
0x0000000000400634 <+68>: movq $0x400838,0x50(%rsp)
0x000000000040063d <+77>: movb $0x0,0x58(%rsp)
0x0000000000400642 <+82>: movl $0x0,0x40(%rsp)
0x000000000040064a <+90>: callq 0x400790 <ErasedObject::~ErasedObject()>
0x000000000040064f <+95>: lea 0x20(%rsp),%rdi
0x0000000000400654 <+100>: callq 0x400790 <ErasedObject::~ErasedObject()>
0x0000000000400659 <+105>: mov %rsp,%rdi
0x000000000040065c <+108>: callq 0x400790 <ErasedObject::~ErasedObject()>
0x0000000000400661 <+113>: mov 0x68(%rsp),%rdx
0x0000000000400666 <+118>: xor %fs:0x28,%rdx
0x000000000040066f <+127>: jne 0x400678 <main()+136>
0x0000000000400671 <+129>: xor %eax,%eax
0x0000000000400673 <+131>: add $0x78,%rsp
0x0000000000400677 <+135>: retq
0x0000000000400678 <+136>: callq 0x4005c0 <__stack_chk_fail@plt>
Code
#include <type_traits>
#include <new>
namespace
{
struct ErasedTypeVTable
{
using destructor_t = void (*)(void *obj);
destructor_t dtor;
};
template <typename T>
void dtor(void *obj)
{
return static_cast<T *>(obj)->~T();
}
template <typename T>
static const ErasedTypeVTable erasedTypeVTable = {
&dtor<T>
};
}
struct ErasedObject
{
std::aligned_storage<sizeof(void *)>::type storage;
const ErasedTypeVTable& vtbl;
bool flag = false;
template <typename T, typename S = typename std::decay<T>::type>
ErasedObject(T&& obj)
: vtbl(erasedTypeVTable<S>)
{
static_assert(sizeof(T) <= sizeof(storage) && alignof(T) <= alignof(decltype(storage)), "");
new (object()) S(std::forward<T>(obj));
}
ErasedObject(ErasedObject&& other) = default;
~ErasedObject()
{
if (flag)
{
::operator delete(object());
}
else
{
vtbl.dtor(object());
}
}
void *object()
{
return reinterpret_cast<char *>(&storage);
}
};
struct myType
{
int a;
};
int main()
{
ErasedObject c1(myType{});
ErasedObject c2(myType{});
//ErasedObject c3(myType{});
}
clang can optimize-out both versions.
Any ideas what's going on? Am I hitting some optimization limit? If so, is it configurable?
GCC performs nearly all supported optimizations that do not involve a space-speed tradeoff. As compared to -O , this option increases both compilation time and the performance of the generated code.
The compiler optimizes to reduce the size of the binary instead of execution speed. If you do not specify an optimization option, gcc attempts to reduce the compilation time and to make debugging always yield the result expected from reading the source code.
Use the command-line option -O0 (-[capital o][zero]) to disable optimization, and -S to get assembly file. Look here to see more gcc command-line options. Show activity on this post.
I ran g++
with -fdump-ipa-inline
to get more information about why functions are or are not inlined.
For the testcase with main() function and three objects created I got:
(...)
150 Deciding on inlining of small functions. Starting with size 35.
151 Enqueueing calls in void {anonymous}::dtor(void*) [with T = myType]/40.
152 Enqueueing calls in int main()/35.
153 not inlinable: int main()/35 -> ErasedObject::~ErasedObject()/33, call is unlikely and code size would grow
154 not inlinable: int main()/35 -> ErasedObject::~ErasedObject()/33, call is unlikely and code size would grow
155 not inlinable: int main()/35 -> ErasedObject::~ErasedObject()/33, call is unlikely and code size would grow
(...)
This error code is set in gcc/gcc/ipa-inline.c:
else if (!e->maybe_hot_p ()
&& (growth >= MAX_INLINE_INSNS_SINGLE
|| growth_likely_positive (callee, growth)))
{
e->inline_failed = CIF_UNLIKELY_CALL;
want_inline = false;
}
Then I discovered, that the smallest change to make g++ inline these functions is to add a declaration:
int main() __attribute__((hot));
I wasn't able to find in code why int main()
isn't considered hot, but probably this should be left for another question.
More interesting is the the second part of the conditional I pasted above. The intent was to not inline when the code will grow and you produced an example when the code shrinks after complete inlining.
I think this deserves to be reported on GCC's bugzilla, but I'm not sure if you can call it a bug - estimation of inline impact is a heuristic and as such it is expected to work correctly in most cases, not all of them.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With