A more efficient way than function reference?

Question

I have a class that uses a reference to a function:

double u( const double& x, const double& y )
{
  return x * y;
}

class equation
{
  equation( double (&in_u)(const double&, const double&) );
//...
protected:
  double (&u)(const double&, const double&);
}

This function would be called something like 10⁸ times during a typical run.

The class goes into a library and the function u is defined by the user of the library. So I cannot have the function definition inside the class.

I have read this:

(std::function) ... has the disadvantage of introducing some (very small) overhead when being called (so in a very performance-critical situation it might be a problem but in most it should not)

Are there any more efficient ways of passing the function u to the class equation? And would this count as "a very performance-critical situation"?

EDIT

There seems to be a bit of confusion. Just to make it clear, the function u is known at the executables' compile time, but not at the library's. Getting the function at run-time is a feature I will consider in later versions of the library, but not now.

Ben Voigt · Accepted Answer

A function pointer (or reference, which is almost identical at the implementation level) will work just fine.

Modern CPUs are very good at branch prediction, after the first couple calls the CPU will recognize that this "indirect" call always goes to the same place, and use speculative execution to keep the pipeline full.

However, there still will be no optimization across the function boundary. No inlining, no auto-vectorization.

If this function is being called 10⁸ times, it is likely that a large number of those are in a very tight loop with varying parameters. In that case, I suggest changing the function prototype to accept an array of parameter values and output an array of results. Then have a loop inside the function, where the compiler can perform optimizations such as unrolling and auto-vectorization.

(This is a specific case of the general principle to deal with interop cost by reducing the number of calls across the boundary)

If that isn't possible, then do pass the parameters by value. As others have said, this is most efficient than const reference for floating-point variables. Probably a lot more efficient, since most calling conventions will use floating-point registers (typically SSE registers, on modern Intel architectures, before that they used the x87 stack) where they are ready to perform computations immediately. Spilling values to/from RAM in order to pass by reference is quite costly, when the function gets inlined then pass-by-reference gets optimized away, but that won't be happening here. This is still not as good as passing an entire array though.

sfjac · Answer

Given that the function isn't known at compile time, you won't get any faster than a function pointer/reference.

The advantage of std::function is that it would allow you to take, say, a functor, member function pointer or lambda expressions. But there is some overhead.

As one comment mentioned, I would replace the const double & args with double. Size is the same on most platforms these days and it removes a dereference.

Here is an example using std::function:

#include <iostream>
#include <functional>
#include <math.h>

double multiply(double x, double y) { return x * y; }
double add(double x, double y) { return x + y; }

class equation
{
public:
    using ComputeFunction_t = std::function<double(double, double)>;

    template <typename FunctionPtr>
    equation(FunctionPtr pfn)
        : computeFunction_m(pfn)
    { }

    void compute(double d1, double d2)
    {
        printf("(%f, %f) => %f
", d1, d2, computeFunction_m(d1, d2));
    }

protected:
    ComputeFunction_t computeFunction_m;
};

int main() {
    equation prod(multiply);
    prod.compute(10, 20); // print 200

    equation sum(add);
    sum.compute(10, 20);  // print 30

    equation hypotenuse([](double x, double y){ return sqrt(x*x + y*y); });
    hypotenuse.compute(3, 4); // print 5

    struct FooFunctor
    {
        FooFunctor(double d = 1.0) : scale_m(d) {}

        double operator()(double x, double y) { return scale_m * (x + y); }
      private:
        double scale_m;
    };

    equation fooadder(FooFunctor{});
    fooadder.compute(10, 20); // print 30

    equation fooadder10(FooFunctor{10.0});
    fooadder10.compute(10, 20);

    struct BarFunctor
    {
        BarFunctor(double d = 1.0) : scale_m(d) {}

        double scaledAdd(double x, double y) { return scale_m * (x + y); }
      private:
        double scale_m;
    };

    BarFunctor bar(100.0);
    std::function<double(double,double)> barf = std::bind(&BarFunctor::scaledAdd, &bar, std::placeholders::_1, std::placeholders::_2);
    equation barfadder(barf);
    barfadder.compute(10, 20); // print 3000

    return 0;
}

But, again, this gain in flexibility does have a small runtime cost. Whether its worth the cost depends on the application. I'd probably lean toward generality and a flexible interface first and then profile later to see if it is a real issue for the sorts of functions that will be used in practice.

If you can make your solver into a header-only library, then when the user provides inline-able functions in his code, you may be able to get better performance. For instance:

template <typename ComputeFunction>
class Equation
{
  public:

    Equation(ComputeFunction fn)
      : computeFunction_m(fn)
    { }

    void compute(double d1, double d2)
    {
        printf("(%f, %f) => %f
", d1, d2, computeFunction_m(d1, d2));
    }

  protected:
    ComputeFunction computeFunction_m;
};

template <typename ComputeFunction>
auto make_equation(ComputeFunction &&fn)
{
    return Equation<ComputeFunction>(fn);
}

Your instantiation of the Equation class now can completely inline the execution of the function. Calling is very similar, given the make_equation function (the above implementation assumes C++14, but the C++11 version isn't much different):

auto fooadder2 = make_equation(FooFunctor{});
fooadder2.compute(10, 20);

auto hypot2 = make_equation([](double x, double y){ return sqrt(x*x + y*y); });
hypot2.compute(3, 4);

With full optimization you'll likely only find the call to printf with the results of the calculation in the compiled code.

NetVipeC · Answer

Using template arguments:

struct u {
    double operator()(const double& x, const double& y) { return x * y; }
};

template <typename Function>
class equation {
    equation();
    //...
    double using_the_function(double x, double y) {
        //...
        auto res = f(x, y);
        //...
        return res;
    }

private:
    Function f;
};

If you don't need to modify the parameters to the function, in the function, it's better to pass by value (in the case of build-in types, this most probably are values that would be load in CPU registers or are already load).

struct u {
    double operator()(double x, double y) { return x * y; }
};

This most probably would inline u in using_the_function method. In you case the compiler could not do it, because the function pointer could point to any function.

The possible problem of this approach if code bloat if you need to support a lot of different functions and/or class is big.

A more efficient way than function reference?

Tags:

c++

performance

reference

class

c++11

Eliad

3 Answers

Ben Voigt

sfjac

NetVipeC

Recent Activity

Donate For Us

A more efficient way than function reference?

Tags:

c++

performance

reference

class

c++11

Eliad

3 Answers

Ben Voigt

sfjac

NetVipeC

Related questions

Recent Activity

Donate For Us