Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Few questions about C++ inline functions

Tags:

c++

inline

The training materials from the class I took seem to be making two conflicting statements.

On one hand:

"Use of inline functions usually results in faster execution"

On the other hand:

"Use of inline functions may decrease performance due to more frequent swapping"

Question 1: Are both statements true?

Question 2: What is meant by "swapping" here?

Please glance at this snippet:

int powA(int a, int b) {
  return (a + b)*(a + b) ;
}

inline int powB(int a, int b) {
  return (a + b)*(a + b) ;
}

int main () {
    Timer *t = new Timer;

    for(int a = 0; a < 9000; ++a) {
        for(int b = 0; b < 9000; ++b) {
             int i = (a + b)*(a + b);       //              322 ms   <-----
            //  int i = powA(a, b);         // not inline : 450 ms
            //  int i = powB(a, b);         // inline :     469 ms
        }
    }

    double d = t->ms();
    cout << "-->  " << d << endl; 

    return 0;
}

Question 3: Why is performance so similar between powA and powB? I would have expected powB performance to be along 322ms, since it is, after all, inline.

like image 667
James Leonard Avatar asked Dec 11 '22 23:12

James Leonard


2 Answers

Question 1

Yes, both statements can be true, in particular circumstances. Obviously they won't both be true at the same time.

Question 2

"Swapping" is likely a reference to OS paging behaviour, where pages are swapped out to disk when the memory pressure becomes high.

In practice, if your inline functions are small then you will usually notice a performance improvement due to eliminating the overhead of a function call and return. However, in very rare circumstances, you may cause code to grow such that it cannot completely reside inside the CPU cache (during a performance-critical tight loop), and you may experience decreased performance. However, if you're coding at that level then you probably should be coding directly in assembly language anyway.

Question 3

The inline modifier is a hint to the compiler that it might want to consider compiling the given function inline. It doesn't have to follow your directions, and the result may also depend on the given compiler options. You can always look at the generated assembly code to find out what it did.

Your benchmark may not even be doing what you want because your compiler might be smart enough to see that you're not even using the result of the function call that you assign into i, so it might not even bother to call your function. Again, look at the generated assembly code.

like image 93
Greg Hewgill Avatar answered Dec 14 '22 13:12

Greg Hewgill


inline inserts the code at the call site, saving on creation of stack frame, saving/restoring registers and a call (branch). In other words, using inline (when it works) is similar to writing the code for inlined function in place of its call.

However, inline isn't guaranteed to do anything and is compiler-dependent. The compiler will sometimes inline functions that aren't inline (well, it's probably the linker that does that when link-time optimization is turned on, but it's easy to imagine situations when it can be done on compiler level - e.g. when the inlined function is static).

If you want to force MSVC to inline functions, use __forceinline and check the assembly. There should be no calls - your code should compile to simple sequence of instructions executed linearly.

Regarding the speed: you can indeed make your code faster by inlining small functions. When you inline large functions however (and "Large" is hard to define, you need to run tests to determine what's large and what's not), your code size becomes bigger. That's because the code of the inlined function is repeated over and over again at the call sites. After all, the whole point of having a call to a function is to save the instruction count by reusing the same subroutine from multiple places in code.

When the code size becomes larger, the instruction caches may be overwhelmed, leading to slower code execution.

Another point to consider: modern out-of-order CPUs (Most desktop CPUs - e.g. Intel Core Duo or i7) have a mechanism (instruction trace) to prefetch branches ahead and "inline" then at hardware level. So aggressive inlining doesn't always make sense.

In your example, you need to see the assembly that your compiler generates. It may be the same for the inline and non-inline versions. If it doesn't inline, try __forceinline if it's MSVC that you're using. If the timing is the same in both cases, it means your CPU does a good job at prefetching instructions and the execution time bottleneck is elsewhere.

like image 33
Sergiy Migdalskiy Avatar answered Dec 14 '22 13:12

Sergiy Migdalskiy