Say I have some functions, each of about two simple lines of code, and they call each other like this: <code>A</code> calls <code>B</code> calls <code>C</code> calls <code>D</code> ... calls <code>K</code>. (So basically it's a long series of short function calls.) How deep will compilers usually go in the call tree to inline these functions?

The question is not meaningful. If you think about inlining, and its consequences, you'll realise it: <ul> <li>Avoids a function call (with all the register saving/frame adjustment)</li> <li>Exposes more context to the optimizer (dead stores, dead code, common sub-expression elimintation...)</li> <li>Duplicates code (bloating the instruction cache and the executable size, among other things)</li> </ul> When deciding whether to inline or not, the compiler thus performs a balancing act between the potential bloat created and the speed gain expected. This balancing act is affected by options: for gcc <code>-O3</code> means optimize for speed while <code>-Oz</code> means optimize for size, on inlining they have quasi opposite behaviors! Therefore, what matters is not the "nesting level" it is the number of instruction (possibly weighted as not all are created equal). This means that a simple forwarding function: <pre class="prettyprint"><code>int foo(int a, int b) { return foo(a, b, 3); } </code></pre> is essentially "transparent" from the inlining point of view. One the other hand, a function counting a hundred lines of code is unlikely to get inlined. Except that a <code>static</code> free functions called only once are quasi systematically inlined, as it does not create any duplication in this case. From this two examples we get a hunch of how the heuristics behave: <ul> <li>the less instructions the function have, the better for inling</li> <li>the less often it is called, the better for inlining</li> </ul> After that, they are parameters you should be able to set to influence one way or another (MSVC as <code>__force_inline</code> which hints strongly at inling, <code>gcc</code> as they <code>-finline-limit</code> flag to "raise" the treshold on the instruction count, etc...) <hr> On a tangent: do you know about partial inlining ? It was introduced in gcc in 4.6. The idea, as the name suggests, is to partially inline a function. Mostly, to avoid the overhead of a function call when the function is "guarded" and may (in some cases) return nearly immediately. For example: <pre class="prettyprint"><code>void foo(Bar* x) { if (not x) { return; } // null pointer, pfff! // ... BIG BLOC OF STATEMENTS ... } void bar(Bar* x) { // DO 1 foo(x); // DO 2 } </code></pre> could get "optimized" as: <pre class="prettyprint"><code>void foo@0(Bar* x) { // ... BIG BLOC OF STATEMENTS ... } void bar(Bar* x) { // DO 1 if (x) { foo@0(x); } // DO 2 } </code></pre> Of course, once again the heuristics for inlining apply, but they apply more discriminately! <hr> And finally, unless you use WPO (Whole Program Optimization) or LTO (Link Time Optimization), functions can only be inlined if their definition is in the same TU (Translation Unit) that the call site.

How deep do compilers inline functions?

Tags:

c++

function

inline

compiler-construction

Say I have some functions, each of about two simple lines of code, and they call each other like this: A calls B calls C calls D ... calls K. (So basically it's a long series of short function calls.) How deep will compilers usually go in the call tree to inline these functions?

344

asked Sep 18 '11 17:09

Paul Manta

2 Answers

The question is not meaningful.

If you think about inlining, and its consequences, you'll realise it:

Avoids a function call (with all the register saving/frame adjustment)
Exposes more context to the optimizer (dead stores, dead code, common sub-expression elimintation...)
Duplicates code (bloating the instruction cache and the executable size, among other things)

When deciding whether to inline or not, the compiler thus performs a balancing act between the potential bloat created and the speed gain expected. This balancing act is affected by options: for gcc -O3 means optimize for speed while -Oz means optimize for size, on inlining they have quasi opposite behaviors!

Therefore, what matters is not the "nesting level" it is the number of instruction (possibly weighted as not all are created equal).

This means that a simple forwarding function:

int foo(int a, int b) { return foo(a, b, 3); }

is essentially "transparent" from the inlining point of view.

One the other hand, a function counting a hundred lines of code is unlikely to get inlined. Except that a static free functions called only once are quasi systematically inlined, as it does not create any duplication in this case.

From this two examples we get a hunch of how the heuristics behave:

the less instructions the function have, the better for inling
the less often it is called, the better for inlining

After that, they are parameters you should be able to set to influence one way or another (MSVC as __force_inline which hints strongly at inling, gcc as they -finline-limit flag to "raise" the treshold on the instruction count, etc...)

On a tangent: do you know about partial inlining ?

It was introduced in gcc in 4.6. The idea, as the name suggests, is to partially inline a function. Mostly, to avoid the overhead of a function call when the function is "guarded" and may (in some cases) return nearly immediately.

For example:

void foo(Bar* x) {
  if (not x) { return; } // null pointer, pfff!

  // ... BIG BLOC OF STATEMENTS ...
}

void bar(Bar* x) {
  // DO 1
  foo(x);
  // DO 2
}

could get "optimized" as:

void foo@0(Bar* x) {
  // ... BIG BLOC OF STATEMENTS ...
}

void bar(Bar* x) {
  // DO 1
  if (x) { foo@0(x); }
  // DO 2
}

Of course, once again the heuristics for inlining apply, but they apply more discriminately!

And finally, unless you use WPO (Whole Program Optimization) or LTO (Link Time Optimization), functions can only be inlined if their definition is in the same TU (Translation Unit) that the call site.

answered Oct 01 '22 16:10

Matthieu M.

I've seen compilers inline more than 5 functions deep. But at some point, it basically becomes a space-efficiency trade-off that the compiler makes. Every compiler is different in this aspect. Visual Studio is very conservative with inlining. GCC (under -O3) and the Intel Compiler love to inline...

answered Oct 01 '22 16:10

Mysticial

Related questions
                            
                                How to properly return a collection of unique_ptr
                            
                                Is shared_ptr swap thread safe?
                            
                                extracting original regex pattern from std::regex
                            
                                How can I deploy a C++11 program (with dependencies) on CentOS 6, whose GCC is C++03?
                            
                                std::enable_shared_from_this; public vs private
                            
                                What does new(3) mean?
                            
                                C++ passing unknown type to a virtual function
                            
                                Visual Studio - "Unknown Compiler version" after upgrading
                            
                                Why does std::cbegin() not call .cbegin() on the container?
                            
                                Is this legal template lambda syntax?
                            
                                Where to put common writable application files?
                            
                                Portable and simple unicode string library for C/C++?
                            
                                How-to write a password-safe class?
                            
                                What's the purpose of: "using namespace"?
                            
                                Is programming against interfaces in Java the same concept as using header files in C/C++?
                            
                                How can I format a std::string using a collection of arguments?
                            
                                return a scope in C++
                            
                                C++ auto detection of template arguments?
                            
                                Why is the size not a template argument of std::initializer_list?
                            
                                Speed of accessing local vs. global variables in gcc/g++ at different optimization levels

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With