I am using a MS specific keyword to force a global function to be inlined, but I noticed that the function fails to inline itself if it uses an object which does have an explicit trivial destructor.
Quoting from MSDN
Even with
__forceinline
, the compiler cannot inline code in all circumstances. The compiler cannot inline a function if:
The function or its caller is compiled with
/Ob0
(the default option for debug builds).The function and the caller use different types of exception handling (C++ exception handling in one, structured exception handling in the other).
The function has a variable argument list.
The function uses inline assembly, unless compiled with
/Og
,/Ox
,/O1
, or/O2
.The function is recursive and not accompanied by
#pragma inline_recursion(on)
. With the pragma, recursive functions are inlined to a default depth of 16 calls. To reduce the inlining depth, useinline_depth
pragma.The function is virtual and is called virtually. Direct calls to virtual functions can be inlined.
The program takes the address of the function and the call is made via the pointer to the function. Direct calls to functions that have had their address taken can be inlined.
The function is also marked with the naked
__declspec
modifier.
I am trying the following self contained program to test the behavior
#include <iostream>
#define INLINE __forceinline
template <class T>
struct rvalue
{
T& r_;
explicit INLINE rvalue(T& r) : r_(r) {}
};
template <class T>
INLINE
T movz(T& t)
{
return T(rvalue<T>(t));
}
template <class T>
class Spam
{
public:
INLINE operator rvalue<Spam>() { return rvalue<Spam>(*this); }
INLINE Spam() : m_value(0) {}
INLINE Spam(rvalue<Spam> p) : m_value(p.r_.m_value) {}
INLINE Spam& operator= (rvalue<Spam> p)
{
m_value = p.r_.m_value;
return *this;
}
INLINE explicit Spam(T value) : m_value(value) { }
INLINE operator T() { return m_value; };
template <class U, class E> INLINE Spam& operator= (Spam<U> u) { return *this; }
INLINE ~Spam() {}
private:
Spam(Spam<T>&); // not defined
Spam& operator= (Spam&); // not defined
private:
T m_value;
};
INLINE int foo()
{
Spam<int> p1(int(5)), p2;
p2 = movz(p1);
return p2;
}
int main()
{
std::cout << foo() << std::endl;
}
With the trivial destructor INLINE ~Spam() {}
in place, we have the following disassembly
int main()
{
000000013F4B1010 sub rsp,28h
std::cout << foo() << std::endl;
000000013F4B1014 lea rdx,[rsp+30h]
000000013F4B1019 lea rcx,[rsp+38h]
000000013F4B101E mov dword ptr [rsp+30h],5
000000013F4B1026 call movz<Spam<int> > (013F4B1000h)
000000013F4B102B mov rcx,qword ptr [__imp_std::cout (013F4B2050h)]
000000013F4B1032 mov edx,dword ptr [rax]
000000013F4B1034 call qword ptr [__imp_std::basic_ostream<char,std::char_traits<char> >::operator<< (013F4B2040h)]
000000013F4B103A mov rdx,qword ptr [__imp_std::endl (013F4B2048h)]
000000013F4B1041 mov rcx,rax
000000013F4B1044 call qword ptr [__imp_std::basic_ostream<char,std::char_traits<char> >::operator<< (013F4B2058h)]
}
where as without the destructor INLINE ~Spam() {}
we have the following disassembly
int main()
{
000000013FF01000 sub rsp,28h
std::cout << foo() << std::endl;
000000013FF01004 mov rcx,qword ptr [__imp_std::cout (013FF02050h)]
000000013FF0100B mov edx,5
000000013FF01010 call qword ptr [__imp_std::basic_ostream<char,std::char_traits<char> >::operator<< (013FF02040h)]
000000013FF01016 mov rdx,qword ptr [__imp_std::endl (013FF02048h)]
000000013FF0101D mov rcx,rax
000000013FF01020 call qword ptr [__imp_std::basic_ostream<char,std::char_traits<char> >::operator<< (013FF02058h)]
}
000000013FF01026 xor eax,eax
}
I am failing to understand, why in the presence of the destructor, the compiler fails to inline the function T movz(T& t)
An inline function is one for which the compiler copies the code from the function definition directly into the code of the calling function rather than creating a separate set of instructions in memory. This eliminates call-linkage overhead and can expose significant optimization opportunities.
Inline code refers to any lines of code that are added in the body of a program. It can be any type of code written in any programming language. The inline code executes independently and is usually executed under some condition by the primary program.
The inline keyword was adopted from C++, but in C++, if a function is declared inline , it must be declared inline in every translation unit, and also every definition of an inline function must be exactly the same (in C, the definitions may be different, and depending on the differences only results in unspecified ...
Standard supportC++ and C99, but not its predecessors K&R C and C89, have support for inline functions, though with different semantics. In both cases, inline does not force inlining; the compiler is free to choose not to inline the function at all, or only in some cases.
Yes, it's a bug. I have tested it on Qt over MinGW compiler environment. It optimizes everything very well.
First, I have changed your code a little bit as below for easier viewing the assembly code:
int main()
{
int i = foo();
std::cout << i << std::endl;
}
And from my Qt's debug disassembly:
45 int main()
46 {
0x401600 lea 0x4(%esp),%ecx
0x401604 <+0x0004> and $0xfffffff0,%esp
0x401607 <+0x0007> pushl -0x4(%ecx)
0x40160a <+0x000a> push %ebp
0x40160b <+0x000b> mov %esp,%ebp
0x40160d <+0x000d> push %ecx
0x40160e <+0x000e> sub $0x54,%esp
0x401611 <+0x0011> call 0x402160 <__main>
0x401616 <+0x0016> movl $0x5,-0x10(%ebp)
47 int i = foo();
0x401683 <+0x0083> mov %eax,-0xc(%ebp)
48 std::cout << i << std::endl;
0x401686 <+0x0086> mov -0xc(%ebp),%eax
0x401689 <+0x0089> mov %eax,(%esp)
0x40168c <+0x008c> mov $0x6fcba2c0,%ecx
0x401691 <+0x0091> call 0x401714 <_ZNSolsEi>
0x401696 <+0x0096> sub $0x4,%esp
0x401699 <+0x0099> movl $0x40171c,(%esp)
0x4016a0 <+0x00a0> mov %eax,%ecx
0x4016a2 <+0x00a2> call 0x401724 <_ZNSolsEPFRSoS_E>
0x4016a7 <+0x00a7> sub $0x4,%esp
49 }
0x4016aa <+0x00aa> mov $0x0,%eax
0x4016af <+0x00af> mov -0x4(%ebp),%ecx
0x4016b2 <+0x00b2> leave
0x4016b3 <+0x00b3> lea -0x4(%ecx),%esp
0x4016b6 <+0x00b6> ret
You can even see that foo() is optimized. You can see that variable 'i' is directly assigned to 5 and is printed.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With