Will the C++ linker automatically inline "pass-through" functions, which are NOT defined in the header, and NOT explicitly requested to be "inlined" through the <code>inline</code> keyword? For example, the following happens so often, and should always benefit from "inlining", that it seems every compiler vendor should have "automatically" handled it through "inlining" through the linker (in those cases where it is possible): <pre class="prettyprint"><code>//FILE: MyA.hpp class MyA { public: int foo(void) const; }; //FILE: MyB.hpp class MyB { private: MyA my_a_; public: int foo(void) const; }; //FILE: MyB.cpp // PLEASE SAY THIS FUNCTION IS "INLINED" BY THE LINKER, EVEN THOUGH // IT WAS NOT IMPLICITLY/EXPLICITLY REQUESTED TO BE "INLINED"? int MyB::foo(void) { return my_a_.foo(); } </code></pre> I'm aware the MSVS linker will perform some "inlining" through its Link Time Code Generation (LTGCC), and that the GCC toolchain also supports Link Time Optimization (LTO) (see: Can the linker inline functions?). Further, I'm aware that there are cases where this cannot be "inlined", such as when the implementation is not "available" to the linker (e.g., across shared library boundaries, where separate linking occurs). However, if this is code is linked into a single executable that does not cross DLL/shared-lib boundaries, I'd expect the compiler/linker vendor to automatically inline the function, as a simple-and-obvious optimization (benefiting both performance-and-size)? Are my hopes too naive?

Here's a quick test of your example (with a <code>MyA::foo()</code> implementation that simply returns <code>42</code>). All these tests were with 32-bit targets - it's possible that different results might be seen with 64-bit targets. It's also worth noting that using the <code>-flto</code> option (GCC) or the <code>/GL</code> option (MSVC) results in full optimization - wherever <code>MyB::foo()</code> is called, it's simply replaced with <code>42</code>. With GCC (MinGW 4.5.1): <pre class="prettyprint"><code>gcc -g -O3 -o test.exe myb.cpp mya.cpp test.cpp </code></pre> the call to MyB::foo() was not optimized away. <code>MyB::foo()</code> itself was slightly optimized to: <pre class="prettyprint"><code>Dump of assembler code for function MyB::foo() const: 0x00401350 <+0>: push %ebp 0x00401351 <+1>: mov %esp,%ebp 0x00401353 <+3>: sub $0x8,%esp => 0x00401356 <+6>: leave 0x00401357 <+7>: jmp 0x401360 <MyA::foo() const> </code></pre> Which is the entry prologue is left in place, but immediately undone (the <code>leave</code> instruction) and the code jumps to MyA::foo() to do the real work. However, this is an optimization that the compiler (not the linker) is doing since it realizes that <code>MyB::foo()</code> is simply returning whatever <code>MyA::foo()</code> returns. I'm not sure why the prologue is left in. MSVC 16 (from VS 2010) handled things a little differently: <code>MyB::foo()</code> ended up as two jumps - one to a 'thunk' of some sort: <pre class="prettyprint"><code>0:000> u myb!MyB::foo myb!MyB::foo: 001a1030 e9d0ffffff jmp myb!ILT+0(?fooMyAQBEHXZ) (001a1005) </code></pre> And the thunk simply jumped to <code>MyA::foo()</code>: <pre class="prettyprint"><code>myb!ILT+0(?fooMyAQBEHXZ): 001a1005 e936000000 jmp myb!MyA::foo (001a1040) </code></pre> Again - this was largely (entirely?) performed by the compiler, since if you look at the object code produced before linking, <code>MyB::foo()</code> is compiled to a plain jump to <code>MyA::foo()</code>. So to boil all this down - it looks like without explicitly invoking LTO/LTCG, linkers today are unwilling/unable to perform the optimization of removing the call to <code>MyB::foo()</code> altogether, even if <code>MyB::foo()</code> is a simple jump to <code>MyA::foo()</code>. So I guess if you want link time optimization, use the <code>-flto</code> (for GCC) or <code>/GL</code> (for the MSVC compiler) and <code>/LTCG</code> (for the MSVC linker) options.

Is it common ? Yes, for mainstream compilers. Is it automatic ? Generally not. MSVC requires the <code>/GL</code> switch, gcc and clang the <code>-flto</code> flag. How does it work ? (gcc only) The traditional linker used in the gcc toolchain is <code>ld</code>, and it's kind of dumb. Therefore, and it might be surprising, link-time optimization is not performed by the linker in the gcc toolchain. Gcc has a specific intermediate representation on which the optimizations are performed that is language agnostic: GIMPLE. When compiling a source file with <code>-flto</code> (which activates the LTO), it saves the intermediate representation in a specific section of the object file. When invoking the linker driver (note: NOT the linker directly) with <code>-flto</code>, the driver will read those specific sections, bundle them together into a big chunk, and feed this bundle to the compiler. The compiler reapplies the optimizations as it usually does for a regular compilation (constant propagation, inlining, and this may expose new opportunities for dead code elimination, loop transformations, etc...) and produces a single big object file. This big object file is finally fed to the regular linker of the toolchain (probably ld, unless you're experimenting with gold), which performes its linker magic. Clang works similarly, and I surmise that MSVC uses a similar trick.

It depends. Most compilers (linkers, really) support this kind of optimizations. But in order for it to be done, the entire code-generation phase pretty much has to be deferred to link-time. MSVC calls the option link-time code generation (LTCG), and it is by default enabled in release builds, IIRC. GCC has a similar option, under a different name, but I can't remember which -O levels, if any, enables it, or if it has to be enabled explicitly. However, "traditionally", C++ compilers have compiled a single translation unit in isolation, after which the linker has merely tied up the loose ends, ensuring that when translation unit A calls a function defined in translation unit B, the correct function address is looked up and inserted into the calling code. if you follow this model, then it is impossible to inline functions defined in another translation unit. It is not just some "simple" optimization that can be done "on the fly", like, say, loop unrolling. It requires the linker and compiler to cooperate, because the linker will have to take over some of the work normally done by the compiler. Note that the compiler will gladly inline functions that are not marked with the <code>inline</code> keyword. But only if it is aware of how the function is defined at the site where it is called. If it can't see the definition, then it can't inline the call. That is why you normally define such small trivial "intended-to-be-inlined" functions in headers, making their definitions visible to all callers.

Inlining is not a linker function. The toolchains that support whole program optimization (cross-TU inlining) do so by not actually compiling anything, just parsing and storing an intermediate representation of the code, at compile time. And then the linker invokes the compiler, which does the actual inlining. This is not done by default, you have to request it explicitly with appropriate command-line options to the compiler and linker. One reason it is not and should not be default, is that it increases dependency-based rebuild times dramatically (sometimes by several orders of magnitude, depending on code organization).

Will C++ linker automatically inline functions (without "inline" keyword, without implementation in header)?

Tags:

c++

optimization

compiler-optimization

inline

linker

Will the C++ linker automatically inline "pass-through" functions, which are NOT defined in the header, and NOT explicitly requested to be "inlined" through the inline keyword?

For example, the following happens so often, and should always benefit from "inlining", that it seems every compiler vendor should have "automatically" handled it through "inlining" through the linker (in those cases where it is possible):

//FILE: MyA.hpp
class MyA
{
  public:
    int foo(void) const;
};

//FILE: MyB.hpp
class MyB
{
  private:
    MyA my_a_;
  public:
    int foo(void) const;
};

//FILE: MyB.cpp
// PLEASE SAY THIS FUNCTION IS "INLINED" BY THE LINKER, EVEN THOUGH
// IT WAS NOT IMPLICITLY/EXPLICITLY REQUESTED TO BE "INLINED"?
int MyB::foo(void)
{
  return my_a_.foo();
}

I'm aware the MSVS linker will perform some "inlining" through its Link Time Code Generation (LTGCC), and that the GCC toolchain also supports Link Time Optimization (LTO) (see: Can the linker inline functions?).

Further, I'm aware that there are cases where this cannot be "inlined", such as when the implementation is not "available" to the linker (e.g., across shared library boundaries, where separate linking occurs).

However, if this is code is linked into a single executable that does not cross DLL/shared-lib boundaries, I'd expect the compiler/linker vendor to automatically inline the function, as a simple-and-obvious optimization (benefiting both performance-and-size)?

Are my hopes too naive?

242

asked Aug 28 '11 21:08

charley

4 Answers

Here's a quick test of your example (with a MyA::foo() implementation that simply returns 42). All these tests were with 32-bit targets - it's possible that different results might be seen with 64-bit targets. It's also worth noting that using the -flto option (GCC) or the /GL option (MSVC) results in full optimization - wherever MyB::foo() is called, it's simply replaced with 42.

With GCC (MinGW 4.5.1):

gcc -g -O3 -o test.exe myb.cpp mya.cpp test.cpp

the call to MyB::foo() was not optimized away. MyB::foo() itself was slightly optimized to:

Dump of assembler code for function MyB::foo() const:
   0x00401350 <+0>:     push   %ebp
   0x00401351 <+1>:     mov    %esp,%ebp
   0x00401353 <+3>:     sub    $0x8,%esp
=> 0x00401356 <+6>:     leave
   0x00401357 <+7>:     jmp    0x401360 <MyA::foo() const>

Which is the entry prologue is left in place, but immediately undone (the leave instruction) and the code jumps to MyA::foo() to do the real work. However, this is an optimization that the compiler (not the linker) is doing since it realizes that MyB::foo() is simply returning whatever MyA::foo() returns. I'm not sure why the prologue is left in.

MSVC 16 (from VS 2010) handled things a little differently:

MyB::foo() ended up as two jumps - one to a 'thunk' of some sort:

0:000> u myb!MyB::foo
myb!MyB::foo:
001a1030 e9d0ffffff      jmp     myb!ILT+0(?fooMyAQBEHXZ) (001a1005)

And the thunk simply jumped to MyA::foo():

myb!ILT+0(?fooMyAQBEHXZ):
001a1005 e936000000      jmp     myb!MyA::foo (001a1040)

Again - this was largely (entirely?) performed by the compiler, since if you look at the object code produced before linking, MyB::foo() is compiled to a plain jump to MyA::foo().

So to boil all this down - it looks like without explicitly invoking LTO/LTCG, linkers today are unwilling/unable to perform the optimization of removing the call to MyB::foo() altogether, even if MyB::foo() is a simple jump to MyA::foo().

So I guess if you want link time optimization, use the -flto (for GCC) or /GL (for the MSVC compiler) and /LTCG (for the MSVC linker) options.

132

answered Sep 29 '22 03:09

Michael Burr

Is it common ? Yes, for mainstream compilers.

Is it automatic ? Generally not. MSVC requires the /GL switch, gcc and clang the -flto flag.

How does it work ? (gcc only)

The traditional linker used in the gcc toolchain is ld, and it's kind of dumb. Therefore, and it might be surprising, link-time optimization is not performed by the linker in the gcc toolchain.

Gcc has a specific intermediate representation on which the optimizations are performed that is language agnostic: GIMPLE. When compiling a source file with -flto (which activates the LTO), it saves the intermediate representation in a specific section of the object file.

When invoking the linker driver (note: NOT the linker directly) with -flto, the driver will read those specific sections, bundle them together into a big chunk, and feed this bundle to the compiler. The compiler reapplies the optimizations as it usually does for a regular compilation (constant propagation, inlining, and this may expose new opportunities for dead code elimination, loop transformations, etc...) and produces a single big object file.

This big object file is finally fed to the regular linker of the toolchain (probably ld, unless you're experimenting with gold), which performes its linker magic.

Clang works similarly, and I surmise that MSVC uses a similar trick.

answered Sep 29 '22 02:09

Matthieu M.

It depends. Most compilers (linkers, really) support this kind of optimizations. But in order for it to be done, the entire code-generation phase pretty much has to be deferred to link-time. MSVC calls the option link-time code generation (LTCG), and it is by default enabled in release builds, IIRC.

GCC has a similar option, under a different name, but I can't remember which -O levels, if any, enables it, or if it has to be enabled explicitly.

However, "traditionally", C++ compilers have compiled a single translation unit in isolation, after which the linker has merely tied up the loose ends, ensuring that when translation unit A calls a function defined in translation unit B, the correct function address is looked up and inserted into the calling code.

if you follow this model, then it is impossible to inline functions defined in another translation unit.

It is not just some "simple" optimization that can be done "on the fly", like, say, loop unrolling. It requires the linker and compiler to cooperate, because the linker will have to take over some of the work normally done by the compiler.

Note that the compiler will gladly inline functions that are not marked with the inline keyword. But only if it is aware of how the function is defined at the site where it is called. If it can't see the definition, then it can't inline the call. That is why you normally define such small trivial "intended-to-be-inlined" functions in headers, making their definitions visible to all callers.

answered Sep 29 '22 03:09

jalf

Inlining is not a linker function.

The toolchains that support whole program optimization (cross-TU inlining) do so by not actually compiling anything, just parsing and storing an intermediate representation of the code, at compile time. And then the linker invokes the compiler, which does the actual inlining.

This is not done by default, you have to request it explicitly with appropriate command-line options to the compiler and linker.

One reason it is not and should not be default, is that it increases dependency-based rebuild times dramatically (sometimes by several orders of magnitude, depending on code organization).

answered Sep 29 '22 01:09

Ben Voigt

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Will C++ linker automatically inline functions (without "inline" keyword, without implementation in header)?

Tags:

c++

optimization

compiler-optimization

inline

linker

charley

People also ask

4 Answers

Michael Burr

Matthieu M.

jalf

Ben Voigt

Recent Activity

Donate For Us

Will C++ linker automatically inline functions (without "inline" keyword, without implementation in header)?

Tags:

c++

optimization

compiler-optimization

inline

linker

charley

People also ask

4 Answers

Michael Burr

Matthieu M.

jalf

Ben Voigt

Related questions

Recent Activity

Donate For Us