In my experience, there's lot of code that explicitly uses inline functions, which comes at a tradeoff:
The question is: does link-time optimization (e.g., in GCC) render manual inlining, e.g., declaring in C99 a function "inline" and providing an implementation, obsolete? Is it true that we don't need to consider inlining for most functions ourselves? What about functions that do always benefit from inlining, e.g., deg_to_rad(x)?
Clarification: I am not thinking about functions that are in the same translation-unit anyway, but about functions that should logically reside in different translation-units.
Update: I have often seen an opposition against "inline", and it was suggested obsolete. Personally, however, I do see explicitly inlined functions often: as functions defined in a class body.
Link time optimization is relevant in programming languages that compile programs on a file-by-file basis, and then link those files together (such as C and Fortran), rather than all at once (such as Java's just-in-time compilation (JIT)).
Inlining is the process of replacing a subroutine or function call at the call site with the body of the subroutine or function being called. This eliminates call-linkage overhead and can expose significant optimization opportunities.
By declaring a function inline, you can direct GCC to integrate that function's code into the code for its callers.
inline functions might make it faster: As shown above, procedural integration might remove a bunch of unnecessary instructions, which might make things run faster. inline functions might make it slower: Too much inlining might cause code bloat, which might cause “thrashing” on demand-paged virtual-memory systems.
Even with LTO, a compiler still has to use heuristics to determine whether or not to inline a function for every call (note it makes the decision not per function, but per call). The heuristic takes into account factors like - is it in a loop, is the loop unrolled, how big the function is, how frequently it is called globally, etc. The compiler will certainly never be able to accurately determine how frequently code is called, and whether or not the code expansion is likely to blow out the instruction/trace/loop/microcode caches of a particular CPU at compile time.
Profile Guided Optimization is supposed to be a step towards addressing this, but if you've ever tried it, you are likely to have noticed that you can get a swing in performance in the order of 0-2%, and it can be in either direction! :-) It's still a work in progress.
If performance is your ultimate goal, and you really know what you are doing, and really do a thorough analysis of your code, what one really needs is a way to tell the compiler to inline or not inline on a per-call basis, not a per-function hint. In practice I have managed this by using compiler specific "force_no_inline" type hints for cases I don't want inlining, and a separate "force_inline" copy (or macro in the rare case this fails) of the function for when I want it inlined. If anyone knows how to do this in a cleaner way with compiler specific hints (for any C/C++ compilers), please let me know.
To specifically address your points:
1.The code becomes less succinct and somewhat less maintainable.
Generally, no - it's just a keyword hint that controls how it is inlined. However if you jump through hoops like I described in the last paragraph, then yes.
2.Sometimes, inlining can greatly increase run-time performance.
When leaving the compiler to its own devices - yes, it certainly can, but mostly doesn't. The compiler has good heuristics that make good although not always optimal inlining decisions. Specificially for the keyword, compilers may totally ignore the keyword, or use to keyword as a weak hint - in general they do seem adverse to inlining code that red flags their heuristics (like inlining a 16k function into a loop unrolled 16x).
3.Inlining is decided at a fixed point in time, maybe without a terribly good foreknowledge of its uses, or without considering all (future) surrounding circumstances.
Yes, it uses static analysis. Dynamic analysis can come from your insight and you manually controlling inlining on a per-call basis, or theoretically from PGO (which still sucks).
GCC 9 Binutils 2.33 experiment to show that LTO can inline
For those that are curious if ld
inlines across object files or not, here is a quick experiment that confirms that it can:
main.c
int notmain(void);
int main(void) {
return notmain();
}
notmain.c
int notmain(void) {
return 42;
}
Compile with LTO and disassemble:
gcc -O3 -flto -ggdb3 -std=c99 -Wall -Wextra -pedantic -c -o main.o main.c
gcc -O3 -flto -ggdb3 -std=c99 -Wall -Wextra -pedantic -c -o notmain.o notmain.c
gcc -O3 -flto -ggdb3 -std=c99 -Wall -Wextra -pedantic -o main.out notmain.o main.o
gdb -batch -ex "disassemble/rs main" main.out
Disassembly output:
0x0000000000001040 <+0>: b8 2a 00 00 00 mov $0x2a,%eax
0x0000000000001045 <+5>: c3 retq
so we see that there is no callq
or other jumps, which means that the call was inlined across the two object files.
Without -flto
however we see:
0x0000000000001040 <+0>: f3 0f 1e fa endbr64
0x0000000000001044 <+4>: e9 f7 00 00 00 jmpq 0x1140 <notmain>
so how there is a JMPQ, which means that the call was not inlined.
Note that the compiler chose JMPQ which does not make any stack changes as would be done by a more naive CALLQ as an optimization, I think this is a trivial minimal case of a tail call optimization.
So yes, if you are using -flto
, you don't need to worry about putting definitions in headers so they can be inlined.
The main downside of having definitions in headers is that they may slow down compilation. For C++ templates, you may also be interested in explicit template instantiation: Explicit template instantiation - when is it used?
Tested in Ubuntu 19.10 amd64.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With