Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Template trick to optimize out allocations

Tags:

c++

I have:

struct DoubleVec {
  std::vector<double> data;
};

DoubleVec operator+(const DoubleVec& lhs, const DoubleVec& rhs) {
  DoubleVec ans(lhs.size());
  for(int i = 0; i < lhs.size(); ++i) {
    ans[i] = lhs[i]] + rhs[i]; // assume lhs.size() == rhs.size()
  }
  return ans;
}

DoubleVec someFunc(DoubleVec a, DoubleVec b, DoubleVec c, DoubleVec d) {
  DoubleVec ans = a + b + c + d;
}

Now, in the above, the "a + b + c + d" will cause the creation of 3 temporary DoubleVec's -- is there a way to optimize this away with some type of template magic ... i.e. to optimize it down to something equivalent to:

DoubleVec ans(a.size());
for(int i = 0; i < ans.size(); i++) ans[i] = a[i] + b[i] + c[i] + d[i];

You can assume all DoubleVec's have the same # of elements.

The high level idea is to have do some type of templateied magic on "+", which "delays the computation" until the =, at which point it looks into itself, goes hmm ... I'm just adding thes numbers, and syntheizes a[i] + b[i] + c[i] + d[i] ... instead of all the temporaries.

Thanks!

like image 648
anon Avatar asked Dec 24 '10 16:12

anon


3 Answers

Yep, that's exactly what expression templates (see http://www.drdobbs.com/184401627 or http://en.wikibooks.org/wiki/More_C%2B%2B_Idioms/Expression-template for example) are for.

The idea is to make operator+ return some kind of proxy object which represents the expression tree to be evaluated. Then operator= is written to take such an expression tree and evaluate it all at once, avoiding the creation of temporaries, and applying any other optimizations that may be applicable.

like image 177
jalf Avatar answered Nov 06 '22 07:11

jalf


Have a look at Boost.Proto, which is a library for writing EDSL (embedded domain specific languages) directly in C++. There is even an example showing exactly what you need.

like image 31
hkaiser Avatar answered Nov 06 '22 05:11

hkaiser


http://codeidol.com/cpp/cpp-template-metaprogramming/Domain-Specific-Embedded-Languages/-10.5.-Blitz-and-Expression-Templates/

If we had to boil the problem solved by Blitz++ down to a single sentence, we'd say, "A naive implementation of array math is horribly inefficient for any interesting computation." To see what we mean, take the boring statement

x = a + b + c;

The problem here is that the operator+ signature above is just too greedy: It tries to evaluate a + b just as soon as it can, rather than waiting until the whole expression, including the addition of c, is available.

In the expression's parse tree, evaluation starts at the leaves and proceeds upwards to the root. What's needed here is some way of delaying evaluation until the library has all of the expression's parts: that is, until the assignment operator is executed. The stratagem taken by Blitz++ is to build a replica of the compiler's parse tree for the whole expression, allowing it to manage evaluation from the top down

This can't be any ordinary parse tree, though: Since array expressions may involve other operations like multiplication, which require their own evaluation strategies, and since expressions can be arbitrarily large and nested, a parse tree built with nodes and pointers would have to be traversed at runtime by the Blitz++ evaluation engine to discover its structure, thereby limiting performance. Furthermore, Blitz++ would have to use some kind of runtime dispatching to handle the different combinations of operation types, again limiting performance.

Instead, Blitz++ builds a compile-time parse tree out of expression templates. Here's how it works in a nutshell: Instead of returning a newly computed Array, operators just package up references to their arguments in an Expression instance, labeled with the operation:

// operation tags
struct plus; struct minus;

// expression tree node
template <class L, class OpTag, class R>
struct Expression
{
    Expression(L const& l, R const& r)
      : l(l), r(r) {}

    float operator[](unsigned index) const;

    L const& l;
    R const& r;
};

// addition operator
template <class L, class R>
Expression<L,plus,R> operator+(L const& l, R const& r)
{
    return Expression<L,plus,R>(l, r);
}

Notice that when we write a + b, we still have all the information needed to do the computationit's encoded in the type Expressionand the data is accessible through the expression's stored references. When we write a + b + c, we get a result of type:

Expression<Expression<Array,plus,Array>,plus,Array>
like image 1
Industrial-antidepressant Avatar answered Nov 06 '22 05:11

Industrial-antidepressant