Policy vs polymorphic speed

Question

I read that using a policy class for a function that will be called in a tight loop is much faster than using a polymorphic function. However, I setup this demo and the timing indicates that it is exactly the opposite!? The policy version takes between 2-3x longer than the polymorphic version.

#include <iostream>

#include <boost/timer.hpp>

// Policy version
template < typename operation_policy>
class DoOperationPolicy : public operation_policy
{
  using operation_policy::Operation;

public:
  void Run(const float a, const float b)
  {
    Operation(a,b);
  }
};

class OperationPolicy_Add
{
protected:
  float Operation(const float a, const float b)
  {
    return a + b;
  }
};

// Polymorphic version
class DoOperation
{
public:
  virtual float Run(const float a, const float b)= 0;
};

class OperationAdd : public DoOperation
{
public:
  float Run(const float a, const float b)
  {
    return a + b;
  }
};

int main()
{
  boost::timer timer;

  unsigned int numberOfIterations = 1e7;

  DoOperationPolicy<OperationPolicy_Add> policy_operation;
  for(unsigned int i = 0; i < numberOfIterations; ++i)
    {
    policy_operation.Run(1,2);
    }
  std::cout << timer.elapsed() << " seconds." << std::endl;
  timer.restart();

  DoOperation* polymorphic_operation = new OperationAdd;
  for(unsigned int i = 0; i < numberOfIterations; ++i)
    {
    polymorphic_operation->Run(1,2);
    }
  std::cout << timer.elapsed() << " seconds." << std::endl;

}

Is there something wrong with the demo? Or is just incorrect that the policy should be faster?

Matthieu M. · Accepted Answer

Your benchmark is meaningless (sorry).

Making real benchmarks is hard, unfortunately, as compilers are very clever.

Things to look for here:

devirtualization: the polymorphic call is expected to be slower because it is supposed to be virtual, but here the compiler can realize than polymorphic_operation is necessarily a OperationAdd and thus directly call OperationAdd::Run without invoking runtime dispatch
inlining: since the compiler has access to the methods body, it can inline them, and avoid the function calls altogether.
"dead store removal": values that are not used need not be stored, and the computations that lead to them and do not provoke side-effects can be avoided entirely.

Indeed, your entire benchmark code can be optimized to:

int main()
{
  boost::timer timer;

  std::cout << timer.elapsed() << " seconds." << std::endl;

  timer.restart();

  DoOperation* polymorphic_operation = new OperationAdd;

  std::cout << timer.elapsed() << " seconds." << std::endl;
}

Which is when you realize that you are not timing what you'd like to...

In order to make your benchmark meaningful you need to:

prevent devirtualization
force side-effects

To prevent devirtualization, just declare a DoOperation& Get() function, and then in another cpp file: DoOperation& Get() { static OperationAdd O; return O; }.

To force side-effects (only necessary if the methods are inlined): return the value and accumulate it, then display it.

In action using this program:

// test2.cpp
namespace so8746025 {

  class DoOperation
  {
  public:
    virtual float Run(const float a, const float b) = 0;
  };

  class OperationAdd : public DoOperation
  {
  public:
    float Run(const float a, const float b)
    {
      return a + b;
    }
  };

  class OperationAddOutOfLine: public DoOperation
  {
  public:
    float Run(const float a, const float b);
  };

  float OperationAddOutOfLine::Run(const float a, const float b)
  {
    return a + b;
  }

  DoOperation& GetInline() {
    static OperationAdd O;
    return O;
  }

  DoOperation& GetOutOfLine() {
    static OperationAddOutOfLine O;
    return O;
  }

} // namespace so8746025

// test.cpp
#include <iostream>

#include <boost/timer.hpp>

namespace so8746025 {

  // Policy version
  template < typename operation_policy>
  struct DoOperationPolicy
  {
    float Run(const float a, const float b)
    {
      return operation_policy::Operation(a,b);
    }
  };

  struct OperationPolicy_Add
  {
    static float Operation(const float a, const float b)
    {
      return a + b;
    }
  };

  // Polymorphic version
  class DoOperation
  {
  public:
    virtual float Run(const float a, const float b) = 0;
  };

  class OperationAdd : public DoOperation
  {
  public:
    float Run(const float a, const float b)
    {
      return a + b;
    }
  };

  class OperationAddOutOfLine: public DoOperation
  {
  public:
    float Run(const float a, const float b);
  };


  DoOperation& GetInline();
  DoOperation& GetOutOfLine();

} // namespace so8746025

using namespace so8746025;

int main()
{
  unsigned int numberOfIterations = 1e8;

  DoOperationPolicy<OperationPolicy_Add> policy;

  OperationAdd stackInline;
  DoOperation& virtualInline = GetInline();

  OperationAddOutOfLine stackOutOfLine;
  DoOperation& virtualOutOfLine = GetOutOfLine();


  boost::timer timer;

  float result = 0;

  for(unsigned int i = 0; i < numberOfIterations; ++i)  {
    result += policy.Run(1,2);
  }
  std::cout << "Policy: " << timer.elapsed() << " seconds (" << result << ")" << std::endl;


  timer.restart();
  result = 0;
  for(unsigned int i = 0; i < numberOfIterations; ++i)
  {
    result += stackInline.Run(1,2);
  }
  std::cout << "Stack Inline: " << timer.elapsed() << " seconds (" << result << ")" << std::endl;

  timer.restart();
  result = 0;
  for(unsigned int i = 0; i < numberOfIterations; ++i)
  {
    result += virtualInline.Run(1,2);
  }
  std::cout << "Virtual Inline: " << timer.elapsed() << " seconds (" << result << ")" << std::endl;

  timer.restart();
  result = 0;
  for(unsigned int i = 0; i < numberOfIterations; ++i)
  {
    result += stackOutOfLine.Run(1,2);
  }
  std::cout << "Stack Out Of Line: " << timer.elapsed() << " seconds (" << result << ")" << std::endl;

  timer.restart();
  result = 0;
  for(unsigned int i = 0; i < numberOfIterations; ++i)
  {
    result += virtualOutOfLine.Run(1,2);
  }
  std::cout << "Virtual Out Of Line: " << timer.elapsed() << " seconds (" << result << ")" << std::endl;

}

We get:

$ gcc --version
gcc (GCC) 4.3.2

$ ./testR
Policy: 0.17 seconds (6.71089e+07)
Stack Inline: 0.17 seconds (6.71089e+07)
Virtual Inline: 0.52 seconds (6.71089e+07)
Stack Out Of Line: 0.6 seconds (6.71089e+07)
Virtual Out Of Line: 0.59 seconds (6.71089e+07)

Note the subtle difference between devirtualization + inline and the absence of devirtualization.

sehe · Answer

FWIW I made it

a policy, as opposed to mixn
return the value
use a volatile to avoid optimizing away of the loop and unrelated optimization of the loop (like, reducing load/stores due to loop unrolling and vectorization on targets that support it).
compare with a direct, static function call
use way more iterations
compile with -O3 on gcc

Timings are:

DoDirect: 3.4 seconds.
Policy: 3.41 seconds.
Polymorphic: 3.4 seconds.

Ergo: there is no difference. Mainly because GCC is able to statically analyze the type of DoOperation* to be DoOperationAdd - there is vtable lookup inside the loop :)

IMPORTANT

If you wanted to benchmark reallife performance of this exact loop, instead of function invocation overhead, drop the volatile. The timings now become

DoDirect: 6.71089e+07 in 1.12 seconds.
Policy: 6.71089e+07 in 1.15 seconds.
Polymorphic: 6.71089e+07 in 3.38 seconds.

As you can see, without volatile, the compiler is able to optimize some load-store cycles away; I assume it might be doing loop unrolling+register allocation there (however I haven't inspected the machine code). The point is, that the loop as a whole can be optimized much more with the 'policy' approach than with the dynamic dispatch (i.e. the virtual method)

CODE

#include <iostream>

#include <boost/timer.hpp>

// Direct version
struct DoDirect {
    static float Run(const float a, const float b) { return a + b; }
};

// Policy version
template <typename operation_policy>
struct DoOperationPolicy {
    float Run(const float a, const float b) const {
        return operation_policy::Operation(a,b);
    }
};

struct OperationPolicy_Add {
    static float Operation(const float a, const float b) {
        return a + b;
    }
};

// Polymorphic version
struct DoOperation {
    virtual float Run(const float a, const float b) const = 0;
};

struct OperationAdd  : public DoOperation { 
    float Run(const float a, const float b) const { return a + b; } 
};

int main(int argc, const char *argv[])
{
    boost::timer timer;

    const unsigned long numberOfIterations = 1<<30ul;

    volatile float result = 0;
    for(unsigned long i = 0; i < numberOfIterations; ++i) {
        result += DoDirect::Run(1,2);
    }
    std::cout << "DoDirect: " << result << " in " << timer.elapsed() << " seconds." << std::endl;
    timer.restart();

    DoOperationPolicy<OperationPolicy_Add> policy_operation;
    for(unsigned long i = 0; i < numberOfIterations; ++i) {
        result += policy_operation.Run(1,2);
    }
    std::cout << "Policy: " << result << " in " << timer.elapsed() << " seconds." << std::endl;
    timer.restart();

    result = 0;
    DoOperation* polymorphic_operation = new OperationAdd;
    for(unsigned long i = 0; i < numberOfIterations; ++i) {
        result += polymorphic_operation->Run(1,2);
    }
    std::cout << "Polymorphic: " << result << " in " << timer.elapsed() << " seconds." << std::endl;

}

Policy vs polymorphic speed

Tags:

c++

oop

David Doria

2 Answers

Matthieu M.

IMPORTANT

CODE

sehe

Recent Activity

Donate For Us

Policy vs polymorphic speed

Tags:

c++

oop

David Doria

2 Answers

Matthieu M.

IMPORTANT

CODE

sehe

Related questions

Recent Activity

Donate For Us