If I inline a function. The function call body will be copied instead of issuing a call() to it. Why can that lead to bad performance? Edit: And what about cache misses because of to big functions then? Why does the rule of thumb "only inline functions with max 3 lines" exist then?

There's no standard way to force inline functions in modern C++ compilers, so this is kind of a moot point. However, assuming you are using compiler-specific functionality to force inline (and the compiler doesn't ignore it) it wouldn't lead to bad performance but it would lead to increased executable size due to there being more copies of the same code. Edit: Per the comment below it should be mentioned that a very unlikely edge case does exist where your code could be executing different copies of the same inlined function in close proximity, reducing the efficiency of the instruction cache. The likelihood that this will measurably affect performance is extremely small, but in certain edge cases it could.

Why can forced inline functions lead to bad performance? [duplicate]

3 Answers

There may be an edge case where inlining a function can increase the program size or move bits of the program around so that cache misses occur where they didn't before. It wouldn't be common, since caches are designed to handle most common situations and are quite large compared to most hotspots.

answered Oct 01 '22 19:10

Mark Ransom

There's no standard way to force inline functions in modern C++ compilers, so this is kind of a moot point. However, assuming you are using compiler-specific functionality to force inline (and the compiler doesn't ignore it) it wouldn't lead to bad performance but it would lead to increased executable size due to there being more copies of the same code.

Edit: Per the comment below it should be mentioned that a very unlikely edge case does exist where your code could be executing different copies of the same inlined function in close proximity, reducing the efficiency of the instruction cache. The likelihood that this will measurably affect performance is extremely small, but in certain edge cases it could.

answered Oct 01 '22 18:10

mbgda

We should take a step back and try to explain how CPUs work. Usually they have different caches, one for the code, which tells the CPU the instructions that will be needed to execute, and one for data, where operations are applied to.

Data cache misses are "easy" to solve, try to use the smallest data structures you can, put close together members that you access more frequently...

Instruction cache misses are more difficult to understand and solve, and that's also the reason why it's commonly recognized that polymorphic behavior in C++ is slower than normal function calls. Basically the CPU will prefetch in its caches the instructions that are stored close to the execution point you're trying to execute, if everything is inline, there's just more data and it won't be able to prefetch everything, leading to a cache miss. Please note this is just a simplistic case, in my experience I had problems with template instantiations that would generate a lot of code, leading to a slower performance than just having simple virtual calls and a not too deep object hierarchy.

As Alexandrescu always points out, you should always time your code

Source: What Every Programmer Should Know About Memory

answered Oct 01 '22 17:10

dau_sama

Related questions
                            
                                Parameter of BackgroundSubtractorMOG2
                            
                                How to get thread state (e.g. suspended), memory + CPU usage, start time, priority, etc
                            
                                C++14 TS functionality and GCC 4.8
                            
                                Expose C++ buffer as Python 3 bytes
                            
                                How memory is allocated for a variable declared outside vs inside main()
                            
                                Taking address of a static member C++ FAQ
                            
                                XCode does not add c++ source files that are in subdirectories into Compile Sources
                            
                                Understanding weak reference counter
                            
                                How exactly is std::make_integer_sequence implemented?
                            
                                Is it possible to initialize a vector from the keys in a map?
                            
                                C++03 moving a vector into a class member through constructor (move semantics)
                            
                                decltype for class method type
                            
                                Using SHA2-512 (CALG_SHA_512) on Windows 7 returns "Invalid Algorithm Specified"
                            
                                Version GLIBCXX_3.4.11 not found (required by buildW.mexglx)
                            
                                Qt find out if QSpinBox was changed by user
                            
                                Is layout-compatibility in the c++11 (working draft) standard too weak?
                            
                                STL ref and cref functions
                            
                                Safe short circuit evaluation in C++11
                            
                                Why C++ STL containers use "less than" operator< and not "equal equal" operator== as comparator?
                            
                                creating objects with same name as class in java

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Why can forced inline functions lead to bad performance? [duplicate]

Tags:

c++

inline

simonides

People also ask

3 Answers

Mark Ransom

mbgda

dau_sama

Recent Activity

Donate For Us