I am using a MS specific keyword to force a global function to be inlined, but I noticed that the function fails to inline itself if it uses an object which does have an explicit trivial destructor. Quoting from MSDN <blockquote> Even with <code>__forceinline</code>, the compiler cannot inline code in all circumstances. The compiler cannot inline a function if: <ul> <li>The function or its caller is compiled with <code>/Ob0</code> (the default option for debug builds).</li> <li>The function and the caller use different types of exception handling (C++ exception handling in one, structured exception handling in the other).</li> <li>The function has a variable argument list.</li> <li>The function uses inline assembly, unless compiled with <code>/Og</code>, <code>/Ox</code>, <code>/O1</code>, or <code>/O2</code>.</li> <li>The function is recursive and not accompanied by <code>#pragma inline_recursion(on)</code>. With the pragma, recursive functions are inlined to a default depth of 16 calls. To reduce the inlining depth, use <code>inline_depth</code> pragma.</li> <li>The function is virtual and is called virtually. Direct calls to virtual functions can be inlined.</li> <li>The program takes the address of the function and the call is made via the pointer to the function. Direct calls to functions that have had their address taken can be inlined.</li> <li>The function is also marked with the naked <code>__declspec</code> modifier.</li> </ul> </blockquote> I am trying the following self contained program to test the behavior <pre class="prettyprint"><code>#include <iostream> #define INLINE __forceinline template <class T> struct rvalue { T& r_; explicit INLINE rvalue(T& r) : r_(r) {} }; template <class T> INLINE T movz(T& t) { return T(rvalue<T>(t)); } template <class T> class Spam { public: INLINE operator rvalue<Spam>() { return rvalue<Spam>(*this); } INLINE Spam() : m_value(0) {} INLINE Spam(rvalue<Spam> p) : m_value(p.r_.m_value) {} INLINE Spam& operator= (rvalue<Spam> p) { m_value = p.r_.m_value; return *this; } INLINE explicit Spam(T value) : m_value(value) { } INLINE operator T() { return m_value; }; template <class U, class E> INLINE Spam& operator= (Spam u) { return *this; } INLINE ~Spam() {} private: Spam(Spam<T>&); // not defined Spam& operator= (Spam&); // not defined private: T m_value; }; INLINE int foo() { Spam<int> p1(int(5)), p2; p2 = movz(p1); return p2; } int main() { std::cout << foo() << std::endl; } </code></pre> With the trivial destructor <code>INLINE ~Spam() {}</code> in place, we have the following disassembly <pre class="prettyprint"><code>int main() { 000000013F4B1010 sub rsp,28h std::cout << foo() << std::endl; 000000013F4B1014 lea rdx,[rsp+30h] 000000013F4B1019 lea rcx,[rsp+38h] 000000013F4B101E mov dword ptr [rsp+30h],5 000000013F4B1026 call movz<Spam<int> > (013F4B1000h) 000000013F4B102B mov rcx,qword ptr [__imp_std::cout (013F4B2050h)] 000000013F4B1032 mov edx,dword ptr [rax] 000000013F4B1034 call qword ptr [__imp_std::basic_ostream<char,std::char_traits<char> >::operator<< (013F4B2040h)] 000000013F4B103A mov rdx,qword ptr [__imp_std::endl (013F4B2048h)] 000000013F4B1041 mov rcx,rax 000000013F4B1044 call qword ptr [__imp_std::basic_ostream<char,std::char_traits<char> >::operator<< (013F4B2058h)] } </code></pre> where as without the destructor <code>INLINE ~Spam() {}</code> we have the following disassembly <pre class="prettyprint"><code>int main() { 000000013FF01000 sub rsp,28h std::cout << foo() << std::endl; 000000013FF01004 mov rcx,qword ptr [__imp_std::cout (013FF02050h)] 000000013FF0100B mov edx,5 000000013FF01010 call qword ptr [__imp_std::basic_ostream<char,std::char_traits<char> >::operator<< (013FF02040h)] 000000013FF01016 mov rdx,qword ptr [__imp_std::endl (013FF02048h)] 000000013FF0101D mov rcx,rax 000000013FF01020 call qword ptr [__imp_std::basic_ostream<char,std::char_traits<char> >::operator<< (013FF02058h)] } 000000013FF01026 xor eax,eax } </code></pre> I am failing to understand, why in the presence of the destructor, the compiler fails to inline the function <code>T movz(T& t)</code> <ul> <li> Note The behavior is consistent from 2008 to 2013</li> <li> Note I checked with cygwin-gcc but the compiler does inlines the code. I cannot verify other compilers at this moment, but would update in next 12 hours if required</li> </ul>

Yes, it's a bug. I have tested it on Qt over MinGW compiler environment. It optimizes everything very well. First, I have changed your code a little bit as below for easier viewing the assembly code: <pre class="prettyprint"><code>int main() { int i = foo(); std::cout <</pre> And from my Qt's debug disassembly: <pre class="prettyprint"><code> 45 int main() 46 { 0x401600 lea 0x4(%esp),%ecx 0x401604 <+0x0004> and $0xfffffff0,%esp 0x401607 <+0x0007> pushl -0x4(%ecx) 0x40160a <+0x000a> push %ebp 0x40160b <+0x000b> mov %esp,%ebp 0x40160d <+0x000d> push %ecx 0x40160e <+0x000e> sub $0x54,%esp 0x401611 <+0x0011> call 0x402160 <__main> 0x401616 <+0x0016> movl $0x5,-0x10(%ebp) 47 int i = foo(); 0x401683 <+0x0083> mov %eax,-0xc(%ebp) 48 std::cout < mov -0xc(%ebp),%eax 0x401689 <+0x0089> mov %eax,(%esp) 0x40168c <+0x008c> mov $0x6fcba2c0,%ecx 0x401691 <+0x0091> call 0x401714 <_ZNSolsEi> 0x401696 <+0x0096> sub $0x4,%esp 0x401699 <+0x0099> movl $0x40171c,(%esp) 0x4016a0 <+0x00a0> mov %eax,%ecx 0x4016a2 <+0x00a2> call 0x401724 <_ZNSolsEPFRSoS_E> 0x4016a7 <+0x00a7> sub $0x4,%esp 49 } 0x4016aa <+0x00aa> mov $0x0,%eax 0x4016af <+0x00af> mov -0x4(%ebp),%ecx 0x4016b2 <+0x00b2> leave 0x4016b3 <+0x00b3> lea -0x4(%ecx),%esp 0x4016b6 <+0x00b6> ret </code></pre> You can even see that foo() is optimized. You can see that variable 'i' is directly assigned to 5 and is printed.

Understanding C++ function Inlining

Tags:

c++

visual-c++

inlining

I am using a MS specific keyword to force a global function to be inlined, but I noticed that the function fails to inline itself if it uses an object which does have an explicit trivial destructor.

Quoting from MSDN

Even with __forceinline, the compiler cannot inline code in all circumstances. The compiler cannot inline a function if:

The function or its caller is compiled with /Ob0 (the default option for debug builds).

The function and the caller use different types of exception handling (C++ exception handling in one, structured exception handling in the other).

The function has a variable argument list.

The function uses inline assembly, unless compiled with /Og, /Ox, /O1, or /O2.

The function is recursive and not accompanied by #pragma inline_recursion(on). With the pragma, recursive functions are inlined to a default depth of 16 calls. To reduce the inlining depth, use inline_depth pragma.

The function is virtual and is called virtually. Direct calls to virtual functions can be inlined.

The program takes the address of the function and the call is made via the pointer to the function. Direct calls to functions that have had their address taken can be inlined.

The function is also marked with the naked __declspec modifier.

I am trying the following self contained program to test the behavior

#include <iostream>
#define INLINE __forceinline
template <class T>
struct rvalue
{
    T& r_;
    explicit INLINE rvalue(T& r) : r_(r) {}
};

template <class T>
INLINE
T movz(T& t)
{
    return T(rvalue<T>(t));
}
template <class T>
class Spam
{
public:
    INLINE operator rvalue<Spam>()  { return rvalue<Spam>(*this); }
    INLINE Spam() : m_value(0)  {}
    INLINE Spam(rvalue<Spam> p) : m_value(p.r_.m_value) {}
    INLINE Spam& operator= (rvalue<Spam> p) 
    {
        m_value = p.r_.m_value;
        return *this; 
    }
    INLINE explicit Spam(T value) : m_value(value) {    }
    INLINE operator T() { return m_value; };
    template <class U, class E> INLINE  Spam& operator= (Spam<U> u) { return *this; }
    INLINE ~Spam() {}
private:
    Spam(Spam<T>&); // not defined
    Spam& operator= (Spam&); // not defined
private:
    T m_value; 
};
INLINE int foo()
{
    Spam<int> p1(int(5)), p2;
    p2 = movz(p1);
    return p2;
}

int main()
{
    std::cout << foo() << std::endl;
}

With the trivial destructor INLINE ~Spam() {} in place, we have the following disassembly

int main()
{
000000013F4B1010  sub         rsp,28h  
    std::cout << foo() << std::endl;
000000013F4B1014  lea         rdx,[rsp+30h]  
000000013F4B1019  lea         rcx,[rsp+38h]  
000000013F4B101E  mov         dword ptr [rsp+30h],5  
000000013F4B1026  call        movz<Spam<int> > (013F4B1000h)  
000000013F4B102B  mov         rcx,qword ptr [__imp_std::cout (013F4B2050h)]  
000000013F4B1032  mov         edx,dword ptr [rax]  
000000013F4B1034  call        qword ptr [__imp_std::basic_ostream<char,std::char_traits<char> >::operator<< (013F4B2040h)]  
000000013F4B103A  mov         rdx,qword ptr [__imp_std::endl (013F4B2048h)]  
000000013F4B1041  mov         rcx,rax  
000000013F4B1044  call        qword ptr [__imp_std::basic_ostream<char,std::char_traits<char> >::operator<< (013F4B2058h)]  
}

where as without the destructor INLINE ~Spam() {} we have the following disassembly

int main()
{
000000013FF01000  sub         rsp,28h  
    std::cout << foo() << std::endl;
000000013FF01004  mov         rcx,qword ptr [__imp_std::cout (013FF02050h)]  
000000013FF0100B  mov         edx,5  
000000013FF01010  call        qword ptr [__imp_std::basic_ostream<char,std::char_traits<char> >::operator<< (013FF02040h)]  
000000013FF01016  mov         rdx,qword ptr [__imp_std::endl (013FF02048h)]  
000000013FF0101D  mov         rcx,rax  
000000013FF01020  call        qword ptr [__imp_std::basic_ostream<char,std::char_traits<char> >::operator<< (013FF02058h)]  
}
000000013FF01026  xor         eax,eax  
}

I am failing to understand, why in the presence of the destructor, the compiler fails to inline the function T movz(T& t)

Note The behavior is consistent from 2008 to 2013
Note I checked with cygwin-gcc but the compiler does inlines the code. I cannot verify other compilers at this moment, but would update in next 12 hours if required

345

asked Sep 11 '14 20:09

Abhijit

1 Answers

Yes, it's a bug. I have tested it on Qt over MinGW compiler environment. It optimizes everything very well.

First, I have changed your code a little bit as below for easier viewing the assembly code:

int main()
{
    int i = foo();
    std::cout << i << std::endl;
}

And from my Qt's debug disassembly:

        45  int main()
        46  {
0x401600                    lea    0x4(%esp),%ecx
0x401604  <+0x0004>         and    $0xfffffff0,%esp
0x401607  <+0x0007>         pushl  -0x4(%ecx)
0x40160a  <+0x000a>         push   %ebp
0x40160b  <+0x000b>         mov    %esp,%ebp
0x40160d  <+0x000d>         push   %ecx
0x40160e  <+0x000e>         sub    $0x54,%esp
0x401611  <+0x0011>         call   0x402160 <__main>
0x401616  <+0x0016>         movl   $0x5,-0x10(%ebp)
        47      int i = foo();
0x401683  <+0x0083>         mov    %eax,-0xc(%ebp)
        48      std::cout << i << std::endl;
0x401686  <+0x0086>         mov    -0xc(%ebp),%eax
0x401689  <+0x0089>         mov    %eax,(%esp)
0x40168c  <+0x008c>         mov    $0x6fcba2c0,%ecx
0x401691  <+0x0091>         call   0x401714 <_ZNSolsEi>
0x401696  <+0x0096>         sub    $0x4,%esp
0x401699  <+0x0099>         movl   $0x40171c,(%esp)
0x4016a0  <+0x00a0>         mov    %eax,%ecx
0x4016a2  <+0x00a2>         call   0x401724 <_ZNSolsEPFRSoS_E>
0x4016a7  <+0x00a7>         sub    $0x4,%esp
        49  }
0x4016aa  <+0x00aa>         mov    $0x0,%eax
0x4016af  <+0x00af>         mov    -0x4(%ebp),%ecx
0x4016b2  <+0x00b2>         leave
0x4016b3  <+0x00b3>         lea    -0x4(%ecx),%esp
0x4016b6  <+0x00b6>         ret

You can even see that foo() is optimized. You can see that variable 'i' is directly assigned to 5 and is printed.

159

answered Sep 28 '22 03:09

Robin Hsu

Related questions
                            
                                DirectX application "hiccups" every 3 seconds
                            
                                Trying to link Boost 1.52 thread
                            
                                Why did C choose certain operators to be the symbol they are? [closed]
                            
                                Block Matching optimization using x86/x64 Streaming SIMD Extension
                            
                                Should I use "unsigned" every time i know I'm processing unsigned values?
                            
                                g++ dynamic vs static linking discrepancy in 32bit vs 64bit compilations
                            
                                Does Posix supply format string macros for printf/scanf?
                            
                                How to reinitialize Boost Log library on fork?
                            
                                C++ External Declaration Isolation
                            
                                Programmatically getting "Operating System Context" in C++
                            
                                Aligned storage and standard layout
                            
                                Why wasn't the move constructor called? [duplicate]
                            
                                What is the difference between Strategy and CRTP for static polymorphism?
                            
                                Basic C++ atomic array
                            
                                Ambiguous Call of "long unsigned int" for "uint32_t"
                            
                                Input parameter passing: is there a size threshold for efficient pass-by-value?
                            
                                Correct usages of QOpenGLFunctions
                            
                                Matlab engines within parallel loop
                            
                                Custom window frame behaving differently across qt builds (ANGLE vs OpenGL)

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With