If I inline a function. The function call body will be copied instead of issuing a call() to it. Why can that lead to bad performance?
Edit: And what about cache misses because of to big functions then? Why does the rule of thumb "only inline functions with max 3 lines" exist then?
inline functions might make the code faster, they might make it slower. They might make the executable larger, they might make it smaller. They might cause thrashing, they might prevent thrashing. And they might be, and often are, totally irrelevant to speed.
5) Inline functions may not be useful for many embedded systems. Because in embedded systems code size is more important than speed. 6) Inline functions might cause thrashing because inlining might increase size of the binary executable file. Thrashing in memory causes performance of computer to degrade.
There's no guarantee that functions will be inlined. You can't force the compiler to inline a particular function, even with the __forceinline keyword.
Answer. An inline function is one for which the compiler copies the code from the function definition directly into the code of the calling function rather than creating a separate set of instructions in memory. This eliminates call-linkage overhead and can expose significant optimization opportunities.
There may be an edge case where inlining a function can increase the program size or move bits of the program around so that cache misses occur where they didn't before. It wouldn't be common, since caches are designed to handle most common situations and are quite large compared to most hotspots.
There's no standard way to force inline functions in modern C++ compilers, so this is kind of a moot point. However, assuming you are using compiler-specific functionality to force inline (and the compiler doesn't ignore it) it wouldn't lead to bad performance but it would lead to increased executable size due to there being more copies of the same code.
Edit: Per the comment below it should be mentioned that a very unlikely edge case does exist where your code could be executing different copies of the same inlined function in close proximity, reducing the efficiency of the instruction cache. The likelihood that this will measurably affect performance is extremely small, but in certain edge cases it could.
We should take a step back and try to explain how CPUs work. Usually they have different caches, one for the code, which tells the CPU the instructions that will be needed to execute, and one for data, where operations are applied to.
Data cache misses are "easy" to solve, try to use the smallest data structures you can, put close together members that you access more frequently...
Instruction cache misses are more difficult to understand and solve, and that's also the reason why it's commonly recognized that polymorphic behavior in C++ is slower than normal function calls. Basically the CPU will prefetch in its caches the instructions that are stored close to the execution point you're trying to execute, if everything is inline, there's just more data and it won't be able to prefetch everything, leading to a cache miss. Please note this is just a simplistic case, in my experience I had problems with template instantiations that would generate a lot of code, leading to a slower performance than just having simple virtual calls and a not too deep object hierarchy.
As Alexandrescu always points out, you should always time your code
Source: What Every Programmer Should Know About Memory
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With