Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Generator called twice in C++20 views pipeline [duplicate]

Here in a simple pipeline of views adaptors, there is the gen function called to generate a sequence of values (using an internal state) and then a filter on it.

What is surprising and counterintuitive (at least for me) is the fact that the generator function is called twice at each iteration, so that the next check on the same filter fails (the filtered value is not reused in the pipeline).

Do you have an idea if this is the correct expected behavior (and why)?

Tested with libstdc++ in GCC 10.3, 11.1 & trunk (code) and range-v3 with GCC & clang (code).

int main() {
  int n = 0;
  auto gen = [&n]() {
    auto result = ++n;
    std::cout << "Generate [" << result << "]\n";
    return result;
  };

  auto tmp =
      ranges::views::iota(0)
      | ranges::views::transform([gen](auto &&) { return gen(); })
      | ranges::views::filter([](auto &&i) {
          std::cout << "#1 " << i << " " << (i % 2) << "\n";
          return (i % 2) == 1;
        });

  for (auto &&i : tmp | ranges::views::take(1)) {
    std::cout << "#2 " << i << " " << ((i % 2) == 1) << "\n";
    assert(((i % 2) == 1));
  }
}

NB: if gen function is written as mutable with an internal state, it does not compile:

  auto gen = [n=0]() mutable {
    auto result = ++n;
    std::cout << "Generate [" << result << "]\n";
    return result;
  };

(and I know that pure functions are better)

like image 252
Pascal H. Avatar asked Apr 29 '21 16:04

Pascal H.


1 Answers

Do you have an idea if this is the correct expected behavior (and why)?

Yes: this is the expected behavior. It is an inherent property of the iteration model where we have operator* and operator++ as separate operations.

filter's operator++ has to look for the next underlying iterator that satisfies the predicate. That involves doing *it on transform's iterator which involves invoking the function. But once we find that next iterator, when we read it again, that will again invoke the transform. In a code snippet:

decltype(auto) transform_view<V, F>::iterator::operator*() const {
    return invoke(f_, *it_);
}

decltype(auto) filter_view<V, P>::iterator::operator*() const {
    // reading through the filter iterator just reads
    // through the underlying iterator, which in this
    // case means invoking the function
    return *it_;
}

auto filter_view<V, P>::iterator::operator++() -> iterator& {
    for (++it_; it_ != ranges::end(parent_->base_); ++it_) {
        // when we eventually find an iterator that satisfies this
        // predicate, we will have needed to read it (which calls
        // the functions) and then the next operator* will do
        // that same thing again
        if (invoke(parent_->pred_, *it_))) {
            break;
        }
    }
    return *this;
}

The result is that we invoke the function twice on every element that satisfies the predicate.


The workaround is to either just not care (have the transform be either cheap enough that invoking it twice doesn't matter or the filter be sufficiently rare that the amount of duplicate transforms don't matter or both) or do add a caching layer into your pipeline.

There's no caching view in C++20 Ranges, but there is one in range-v3 named views::cache1:

ranges::views::iota(0)
    | ranges::views::transform(f)
    | ranges::views::cache1
    | ranges::views::filter(g)

This ensures that f only gets invoked at most once per element, at the cost of having to deal with an element cache and downgrading your range to only be an input range (where before it was bidirectional).

like image 162
Barry Avatar answered Nov 15 '22 06:11

Barry