Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Why gcc can't inline function pointers that can be determined?

Tags:

c++

gcc

The following program compiled under gcc 4.6.2 on centos with -O3:

#include <iostream>
#include <vector>
#include <algorithm>
#include <ctime>
using namespace std;

template <typename T>
class F {
public:
     typedef void (T::*Func)();

     F(Func f) : f_(f) {}

     void operator()(T& t) {
         (t.*f_)();
     }
private:
     Func f_;
};

struct X {
    X() : x_(0) {}

    void f(){
        ++x_;
    }

    int x_;
};

int main()
{
     const int N = 100000000;
     vector<X> xv(N);
     auto begin = clock();
     for_each (xv.begin(), xv.end(), F<X>(&X::f));
     auto end = clock();
     cout << end - begin << endl;
}

objdump -D shows that the generated code for the loop is:

  40097c:       e8 57 fe ff ff          callq  4007d8 <clock@plt>
  400981:       49 89 c5                mov    %rax,%r13
  400984:       0f 1f 40 00             nopl   0x0(%rax)
  400988:       48 89 ef                mov    %rbp,%rdi
  40098b:       48 83 c5 04             add    $0x4,%rbp
  40098f:       e8 8c ff ff ff          callq  400920 <_ZN1X1fEv>
  400994:       4c 39 e5                cmp    %r12,%rbp
  400997:       75 ef                   jne    400988 <main+0x48>
  400999:       e8 3a fe ff ff          callq  4007d8 <clock@plt>

Obviously gcc doesn't inline the function. Why isn't gcc capable of this optimization? Is there any compiler flag that can make gcc do the desired optimization?

like image 206
Kan Li Avatar asked Apr 06 '12 18:04

Kan Li


1 Answers

Some good reading material on this is Scott Adams Meyers' Effective C++ (Third Edition) Item 30: Understand the ins and outs of inlining, where he claims that a call to function pointer is never inlined. The third edition was published in 2008, and I have indeed been able to get gcc to inline function call by compile-time-constant-pointer starting in gcc 4.6, which came out in 2011 (maybe 2010?). However, this was in C and is tricky. In one scenario, I had to declare the calling function __attribute__((flatten)) before it would inline the call (in this situation, I passed the function pointer as the member of a struct, who's pointer I then passed to an inline function that would make the function call by pointer that got inlined).

So in short, no, this isn't a bug gcc, but that doesn't mean that gcc (and/or other compilers) might not be able to inline this some day. But the real issue, I think, is that you don't understand what's really happening here. To get that understanding, you have to think more like an assembly programmer, or a compiler programmer.

You're passing an object of type F<X> and initializing it with a pointer to a member function of another class. You have not declared your instance F<X> object constant, it's Func f_ member as constant, nor your void F::operator()(T& t) member as constant. At the C++ language level, the compiler has to treat it as non-constant. That still doesn't mean that it can't later, at the optimization stage, determine that your function pointer doesn't change, but you're making it incredibly hard at this point. But at least it's a local. If your F<X> object had been global and not declared static, it would forbid it entirely from being considered constant.

Hopefully, you're doing this on an exercise in inlining by function pointer and not as a real solution for indirection. When you want C++ to make real performance stuff, you use the power of types. Specifically, when I declare a template parameter as a member function pointer, it isn't just a constant, it's part of the type. I've never seen a case where this technique generates a function call.

#include <iostream>
#include <vector>
#include <algorithm>
#include <ctime>
using namespace std;

template <typename T, void (T::*f_)()>
class F {
public:
     void operator()(T& t) {
         (t.*f_)();
     }
};

struct X {
    X() : x_(0) {}

    void f(){
        ++x_;
    }

    int x_;
};

int __attribute__((flatten)) main()
{
     const int N = 100000000;
     vector<X> xv(N);

     auto begin = clock();
     for_each (xv.begin(), xv.end(), F<X, &X::f>());
     auto end = clock();
     cout << end - begin << endl;

}
like image 141
Daniel Santos Avatar answered Sep 17 '22 04:09

Daniel Santos