The following program compiled under gcc 4.6.2 on centos with -O3:
#include <iostream>
#include <vector>
#include <algorithm>
#include <ctime>
using namespace std;
template <typename T>
class F {
public:
typedef void (T::*Func)();
F(Func f) : f_(f) {}
void operator()(T& t) {
(t.*f_)();
}
private:
Func f_;
};
struct X {
X() : x_(0) {}
void f(){
++x_;
}
int x_;
};
int main()
{
const int N = 100000000;
vector<X> xv(N);
auto begin = clock();
for_each (xv.begin(), xv.end(), F<X>(&X::f));
auto end = clock();
cout << end - begin << endl;
}
objdump -D
shows that the generated code for the loop is:
40097c: e8 57 fe ff ff callq 4007d8 <clock@plt>
400981: 49 89 c5 mov %rax,%r13
400984: 0f 1f 40 00 nopl 0x0(%rax)
400988: 48 89 ef mov %rbp,%rdi
40098b: 48 83 c5 04 add $0x4,%rbp
40098f: e8 8c ff ff ff callq 400920 <_ZN1X1fEv>
400994: 4c 39 e5 cmp %r12,%rbp
400997: 75 ef jne 400988 <main+0x48>
400999: e8 3a fe ff ff callq 4007d8 <clock@plt>
Obviously gcc doesn't inline the function. Why isn't gcc capable of this optimization? Is there any compiler flag that can make gcc do the desired optimization?
Some good reading material on this is Scott Adams Meyers' Effective C++ (Third Edition) Item 30: Understand the ins and outs of inlining, where he claims that a call to function pointer is never inlined. The third edition was published in 2008, and I have indeed been able to get gcc to inline function call by compile-time-constant-pointer starting in gcc 4.6, which came out in 2011 (maybe 2010?). However, this was in C and is tricky. In one scenario, I had to declare the calling function __attribute__((flatten))
before it would inline the call (in this situation, I passed the function pointer as the member of a struct, who's pointer I then passed to an inline function that would make the function call by pointer that got inlined).
So in short, no, this isn't a bug gcc, but that doesn't mean that gcc (and/or other compilers) might not be able to inline this some day. But the real issue, I think, is that you don't understand what's really happening here. To get that understanding, you have to think more like an assembly programmer, or a compiler programmer.
You're passing an object of type F<X>
and initializing it with a pointer to a member function of another class. You have not declared your instance F<X>
object constant, it's Func f_
member as constant, nor your void F::operator()(T& t)
member as constant. At the C++ language level, the compiler has to treat it as non-constant. That still doesn't mean that it can't later, at the optimization stage, determine that your function pointer doesn't change, but you're making it incredibly hard at this point. But at least it's a local. If your F<X>
object had been global and not declared static
, it would forbid it entirely from being considered constant.
Hopefully, you're doing this on an exercise in inlining by function pointer and not as a real solution for indirection. When you want C++ to make real performance stuff, you use the power of types. Specifically, when I declare a template parameter as a member function pointer, it isn't just a constant, it's part of the type. I've never seen a case where this technique generates a function call.
#include <iostream>
#include <vector>
#include <algorithm>
#include <ctime>
using namespace std;
template <typename T, void (T::*f_)()>
class F {
public:
void operator()(T& t) {
(t.*f_)();
}
};
struct X {
X() : x_(0) {}
void f(){
++x_;
}
int x_;
};
int __attribute__((flatten)) main()
{
const int N = 100000000;
vector<X> xv(N);
auto begin = clock();
for_each (xv.begin(), xv.end(), F<X, &X::f>());
auto end = clock();
cout << end - begin << endl;
}
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With