Using gcc v4.8.1
If I do:
//func.hpp
#ifndef FUNC_HPP
#define FUNC_HPP
int func(int);
#endif
//func.cpp
#include "func.hpp"
int func(int x){
return 5*x+7;
}
//main.cpp
#include <iostream>
#include "func.hpp"
using std::cout;
using std::endl;
int main(){
cout<<func(5)<<endl;
return 0;
}
Even the simple function func
will not get inlined. No combination of inline
, extern
, static
, and __attribute__((always_inline))
on the prototype and/or the definition changes this (obviously some combinations of these specifiers cause it to not even compile and/or produce warnings, not talking about those). I'm using g++ *.cpp -O3 -o run
and g++ *.cpp -O3 -S
for assembly output. When I look at the assembly output, I still see call func
. It appears only way I can get the function to be properly inlined is to have the prototype (probably not necessary) and the definition of the function in the header file. If the header is only included by one file in the whole program (included by only main.cpp
for example) it will compile and the function will be properly inlined without even needing the inline
specifier. If the header is to be included by multiple files, the inline
specifier appears to be needed to resolve multiple definition errors, and that appears to be its only purpose. The function is of course inlined properly.
So my question is: am I doing something wrong? Am I missing something? Whatever happened to:
"The compiler is smarter than you. It knows when a function should be inlined better than you do. And never ever use C arrays. Always use std::vector!"
-Every other StackOverflow user
Really? So calling func(5) and printing the result is faster than just printing 32? I will blindly follow you off the edge of a cliff all mighty all knowing and all wise gcc.
For the record, the above code is just an example. I am writing a ray tracer and when I moved all of the code of my math and other utility classes to their header files and used the inline
specifier, I saw massive performance gains. Literally like 10 times faster for some scenes.
The most reliable way to see if a function is being inlined or not is to look at the output from the compiler. Most compilers have a switch to output assembler code for your inspection.
The definition of an inline function doesn't have to be in a header file but, because of the one definition rule (ODR) for inline functions, an identical definition for the function must exist in every translation unit that uses it. The easiest way to achieve this is by putting the definition in a header file.
Any C++ function may be declared inline. But if the inline function is a public member function (a.k.a., public method) of the class it is necessary to place the code for the inline function inside the header file.
An inline function is one for which the compiler copies the code from the function definition directly into the code of the calling function rather than creating a separate set of instructions in memory. This eliminates call-linkage overhead and can expose significant optimization opportunities.
Recent GCC is able to inline across compilation units through link-time optimizations (LTO). You need to compile - and link - with -flto
; see Link-time optimization and inline and GCC optimize options.
(Actually, LTO is done by a special variant lto1
of the compiler at link time; LTO works by serializing, inside the object files, some internal representations of GCC, which are also used by lto1
; so what happens with -flto
is that when compiling a src1.c
with it the generated src1.o
contains the GIMPLE representations in addition of the object binary; and when linking with gcc -flto src*.o
the lto1
"front-end" is extracting that GIMPLE representations from inside the src*.o
and almost recompiling all again...)
You need to explicitly pass -flto
both at compile time AND at link time (see this). If using a Makefile
you could try make CC='gcc -flto'
; otherwise, compile each translation unit with e.g. gcc -Wall -flto -O2 -c src1.c
(and likewise for src2.c
etc...) and link all of your program (or library) with gcc -Wall -flto -O2 src1.o src2.o -o prog -lsomelib
Notice that -flto
will significantly slow down your build (it is not passed by -O3
so you need to use it explicitly, and you need to link with it also). Often you get a 5% or 10% improvement of performance -of the built program- at the expense of nearly doubling the build time. Sometimes you can get more improvements.
The compiler can't inline what it doesn't have. It needs the full body of the function to inline its code.
You have to remember that the compiler only works on one source file at a time (more precisely, one translation unit at a time), and have no idea about other source files and whats in them.
The linker might be able to do it though, as it sees all the code, and some linkers have flags that allows some link-time optimizations.
The inline keyword is nothing more than a suggestion to the compiler, "i want this function to be inlined". It can ignore this keyword, without even a warning.
In order for your function func(...) to be inlined, your compiler/linker HAVE TO support some form of link-time code generation(and optimizaton). Because func() and main() lie in different code units, the C++ compiler can't see them both at the same time, and therefore can't inline one function within the other. It NEEDS the LINKER SUPPORT to do so.
Consult your build tool manuals on how to switch link time code gen features on, if they are supported at all.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With