Link-time optimization and inline

Tags:

In my experience, there's lot of code that explicitly uses inline functions, which comes at a tradeoff:

The code becomes less succinct and somewhat less maintainable.
Sometimes, inlining can greatly increase run-time performance.
Inlining is decided at a fixed point in time, maybe without a terribly good foreknowledge of its uses, or without considering all (future) surrounding circumstances.

The question is: does link-time optimization (e.g., in GCC) render manual inlining, e.g., declaring in C99 a function "inline" and providing an implementation, obsolete? Is it true that we don't need to consider inlining for most functions ourselves? What about functions that do always benefit from inlining, e.g., deg_to_rad(x)?

Clarification: I am not thinking about functions that are in the same translation-unit anyway, but about functions that should logically reside in different translation-units.

Update: I have often seen an opposition against "inline", and it was suggested obsolete. Personally, however, I do see explicitly inlined functions often: as functions defined in a class body.

719

asked Aug 12 '11 21:08

ccom

2 Answers

Even with LTO, a compiler still has to use heuristics to determine whether or not to inline a function for every call (note it makes the decision not per function, but per call). The heuristic takes into account factors like - is it in a loop, is the loop unrolled, how big the function is, how frequently it is called globally, etc. The compiler will certainly never be able to accurately determine how frequently code is called, and whether or not the code expansion is likely to blow out the instruction/trace/loop/microcode caches of a particular CPU at compile time.

Profile Guided Optimization is supposed to be a step towards addressing this, but if you've ever tried it, you are likely to have noticed that you can get a swing in performance in the order of 0-2%, and it can be in either direction! :-) It's still a work in progress.

If performance is your ultimate goal, and you really know what you are doing, and really do a thorough analysis of your code, what one really needs is a way to tell the compiler to inline or not inline on a per-call basis, not a per-function hint. In practice I have managed this by using compiler specific "force_no_inline" type hints for cases I don't want inlining, and a separate "force_inline" copy (or macro in the rare case this fails) of the function for when I want it inlined. If anyone knows how to do this in a cleaner way with compiler specific hints (for any C/C++ compilers), please let me know.

To specifically address your points:

1.The code becomes less succinct and somewhat less maintainable.

Generally, no - it's just a keyword hint that controls how it is inlined. However if you jump through hoops like I described in the last paragraph, then yes.

2.Sometimes, inlining can greatly increase run-time performance.

When leaving the compiler to its own devices - yes, it certainly can, but mostly doesn't. The compiler has good heuristics that make good although not always optimal inlining decisions. Specificially for the keyword, compilers may totally ignore the keyword, or use to keyword as a weak hint - in general they do seem adverse to inlining code that red flags their heuristics (like inlining a 16k function into a loop unrolled 16x).

3.Inlining is decided at a fixed point in time, maybe without a terribly good foreknowledge of its uses, or without considering all (future) surrounding circumstances.

Yes, it uses static analysis. Dynamic analysis can come from your insight and you manually controlling inlining on a per-call basis, or theoretically from PGO (which still sucks).

140

answered Sep 29 '22 11:09

Crowley9

GCC 9 Binutils 2.33 experiment to show that LTO can inline

For those that are curious if ld inlines across object files or not, here is a quick experiment that confirms that it can:

main.c

Click to copy

int notmain(void);

int main(void) {
    return notmain();
}

notmain.c

Click to copy

int notmain(void) {
    return 42;
}

Compile with LTO and disassemble:

Click to copy

gcc -O3 -flto -ggdb3 -std=c99 -Wall -Wextra -pedantic -c -o main.o main.c
gcc -O3 -flto -ggdb3 -std=c99 -Wall -Wextra -pedantic -c -o notmain.o notmain.c
gcc -O3 -flto -ggdb3 -std=c99 -Wall -Wextra -pedantic -o main.out notmain.o main.o
gdb -batch -ex "disassemble/rs main" main.out

Disassembly output:

Click to copy

   0x0000000000001040 <+0>:     b8 2a 00 00 00  mov    $0x2a,%eax
   0x0000000000001045 <+5>:     c3      retq

so we see that there is no callq or other jumps, which means that the call was inlined across the two object files.

Without -flto however we see:

Click to copy

   0x0000000000001040 <+0>:     f3 0f 1e fa     endbr64 
   0x0000000000001044 <+4>:     e9 f7 00 00 00  jmpq   0x1140 <notmain>

so how there is a JMPQ, which means that the call was not inlined.

Note that the compiler chose JMPQ which does not make any stack changes as would be done by a more naive CALLQ as an optimization, I think this is a trivial minimal case of a tail call optimization.

So yes, if you are using -flto, you don't need to worry about putting definitions in headers so they can be inlined.

The main downside of having definitions in headers is that they may slow down compilation. For C++ templates, you may also be interested in explicit template instantiation: Explicit template instantiation - when is it used?

Tested in Ubuntu 19.10 amd64.

answered Sep 29 '22 11:09

Ciro Santilli 新疆再教育营六四事件法轮功郝海东

Related questions
                            
                                Get a font filename based on Font Name and Style (Bold/Italic)
                            
                                Should this compile? Overload resolution and implicit conversions
                            
                                Android NDK - write only in C/C++
                            
                                Is this C++ code portable? (assuming multidimensional arrays have continuous memory layout)
                            
                                Force to link against unused shared library
                            
                                SFINAE and decltype(auto)
                            
                                Override function parameter type with type of derived class
                            
                                If function f() returns a pointer, which is correct: auto* v = f() OR auto v = f()?
                            
                                How is sort for std::deque implemented?
                            
                                const char* vs const char[]
                            
                                Smallest multiple having only 0 and 4 as its digits
                            
                                Why was Eigen chosen for TensorFlow? [closed]
                            
                                Is this a clang bug or something I don't know about C++?
                            
                                Difference between C++ object construction methods
                            
                                private non-const and public const member function - coexisting in peace?
                            
                                What is the correct Qt idiom for exposing signals/slots of contained widgets?
                            
                                Returning object from function
                            
                                How do you use C++0x raw strings with GCC 4.5?
                            
                                Write C++ container that fits neatly into STL
                            
                                Anticipate "kernel too old" errors between 2.6.16 and 2.6.26 kernel versions

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Link-time optimization and inline

Tags:

c++

c

optimization

gcc

ccom

People also ask

2 Answers

Crowley9

Ciro Santilli 新疆再教育营六四事件法轮功郝海东

Recent Activity

Donate For Us