Why is the recursive version of this function faster?

Question

Here is a simple class for iterating over a multidimensional numeric range:

#include <array>
#include <limits>

template <int N>
class NumericRange
{
public:
  //  typedef std::vector<double>::const_iterator const_iterator;
  NumericRange() {
    _lower.fill(std::numeric_limits<double>::quiet_NaN());
    _upper.fill(std::numeric_limits<double>::quiet_NaN());
    _delta.fill(std::numeric_limits<double>::quiet_NaN());
  }
  NumericRange(const std::array<double, N> & lower, const std::array<double, N> & upper, const std::array<double, N> & delta):
    _lower(lower), _upper(upper), _delta(delta) {
    _state.fill(std::numeric_limits<double>::quiet_NaN());
    _next_index_to_advance = 0;
  }

  const std::array<double, N> & get_state() const {
    return _state;
  }

  void start() {
    _state = _lower;
  }

  bool in_range(int index_to_advance = N-1) const {
    return ( _state[ index_to_advance ] - _upper[ index_to_advance ] ) < _delta[ index_to_advance ];
  }

  void advance(int index_to_advance = 0) {
    _state[ index_to_advance ] += _delta[ index_to_advance ];
    if ( ! in_range(index_to_advance) ) {
      if (index_to_advance < N-1) {
    // restart index_to_advance
    _state[index_to_advance] = _lower[index_to_advance];

    // carry
    index_to_advance;
    advance(index_to_advance+1);
      }
    }
  }

private:
  std::array<double, N> _lower, _upper, _delta, _state;
  int _next_index_to_advance;
};

int main() {
  std::array<double, 7> lower{0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0};
  std::array<double, 7> upper{1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0};
  std::array<double, 7> delta{0.03, 0.06, 0.03, 0.06, 0.03, 0.06, 0.03};

  NumericRange<7> nr(lower, upper, delta);
  int c = 0;
  for (nr.start(); nr.in_range(); nr.advance()) {
    const std::array<double, 7> & st = nr.get_state();
    ++c;
  }
  std::cout << "took " << c << " steps" << std::endl;

  return 0;
}

When I replace the advance function with a non-recursive variant, the runtime increases:

void advance(int index_to_advance = 0) {
  bool carry;
  do {
    carry = false;
    _state[ index_to_advance ] += _delta[ index_to_advance ];
    if ( ! in_range(index_to_advance) ) {
      if (index_to_advance < N-1) {
    // restart index_to_advance
    _state[index_to_advance] = _lower[index_to_advance];

    // carry
    ++index_to_advance;
    carry = true;
    //    advance(index_to_advance);
      }
    }
  } while (carry);
}

Runtimes were taken using unix user time via the command time. The code was compiled using gcc-4.7 with options -std=c++11 -O3 (but I think it should work with c++0x on gcc-4.6). The recursive version took 13s and the iterative version took 30s. Both require the same number of advance calls to terminate (and if you print the nr.get_state() array inside the for(ns.start()...) loop, both do the same thing).

This is a fun riddle! Help me figure out why recursive would be more efficient / more optimizable.

David Rodríguez - dribeas · Accepted Answer

The recursive version is an example of tail-recursion which means that the compiler can transform the recursion into iteration. Now, once the transformation is performed, the recursive function would look similar to this:

void advance(int index_to_advance = 0) {
    _state[ index_to_advance ] += _delta[ index_to_advance ];
    while ( !in_range(index_to_advance) && index_to_advance < N-1 ) {
        // restart index_to_advance
        _state[index_to_advance] = _lower[index_to_advance];

        // carry
        ++index_to_advance;
        _state[ index_to_advance ] += _delta[ index_to_advance ];
    }
  }

As you see your version contains one extra test and the condition variable. The loop, if you look closely is equivalent to

for( ; index_to_advance < N-1 && !in_range(index_to_advance);++index_to_advance)

(removing the ++index_to_advance at the end), and the optimizer might have a better chance of unrolling that.

That being said, I don't think this explains the huge time difference, although it does explain why the recursive version is not much slower than the iterative one. Check the generated assembly to see what the compiler actually did.

Why is the recursive version of this function faster?

Tags:

c++

compiler-optimization

c++11

user

1 Answers

David Rodríguez - dribeas

Recent Activity

Donate For Us

Why is the recursive version of this function faster?

Tags:

c++

compiler-optimization

c++11

user

1 Answers

David Rodríguez - dribeas

Related questions

Recent Activity

Donate For Us