Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Proxy objects in iterators

I have a big vector of items that belong to a certain class.

struct item {
    int class_id;
    //some other data...
};

The same class_id can appear multiple times in the vector, and the vector is constructed once and then sorted by class_id. So all elements of the same class are next to each other in the vector.

I later have to process the items per class, ie. I update all items of the same class but I do not modify any item of a different class. Since I have to do this for all items and the code is trivially parallelizable I wanted to use Microsoft PPL with Concurrency::parallel_for_each(). Therefore I needed an iterator and came up with a forward iterator that returns the range of all items with a certain class_id as proxy object. The proxy is simply a std::pair and the proxy is the iterator's value type.

using item_iterator = std::vector<item>::iterator;
using class_range = std::pair<item_iterator, item_iterator>;

//iterator definition
class per_class_iterator : public std::iterator<std::forward_iterator_tag, class_range> { /* ... */ };

By now I was able to loop over all my classes and update the items like this.

std::vector<item> items;
//per_class_* returns a per_class_iterator
std::for_each(items.per_class_begin(), items.per_class_end(),
[](class_range r) 
{ 
    //do something for all items in r 
    std::for_each(r.first, r.second, /* some work */);
});

When replacing std::for_each with Concurrency::parallel_for_each the code crashed. After debugging I found the problem to be the following code in _Parallel_for_each_helper in ppl.h at line 2772 ff.

// Add a batch of work items to this functor's array
for (unsigned int _Index=0; (_Index < _Size) && (_First != _Last); _Index++)
{
    _M_element[_M_len++] = &(*_First++);
}

It uses postincrement (so a temporary iterator is returned), dereferences that temporary iterator and takes the address of the dereferenced item. This only works if the item returned by dereferencing a temporary object survives, ie. basically if it points directly into the container. So fixing this is easy, albeit the per class std::for_each work loop has to be replaced with a for-loop.

//it := iterator somewhere into the vector of items (item_iterator)
for(const auto cur_class = it->class_id; cur_class == it->class_id; ++it)
{
    /* some work */
}

My question is if returning proxy objects the way I did is violating the standard or if the assumption that every iterator dereferences into permanent data has been made by Microsoft for their library, but is not documented. At least I could not find any documentation on the iterator requirements for parallel_for_each() except that either a random access or a forward iterator are expected. I have seen the question about forward iterators and vector but since my iterator's reference type is const value_type& I still think my iterator is ok by the standard. So is a forward iterator returning a proxy object still a valid forward iterator? Or put another way, is it ok for an iterator to have a value type different from a type that is actually stored somewhere in a container?

Compilable example:

#include <vector>
#include <utility>
#include <cassert>
#include <iterator>
#include <memory>
#include <algorithm>
#include <iostream>

#include <ppl.h>


using identifier = int;
struct item
{
    identifier class_id;
    // other data members
    // ...

    bool operator<(const item &rhs) const
    {
        return class_id < rhs.class_id;
    }

    bool operator==(const item &rhs) const
    {
        return class_id == rhs.class_id;
    }

    //inverse operators omitted
};
using container = std::vector<item>;
using item_iterator = typename container::iterator;
using class_range = std::pair<item_iterator, item_iterator>;

class per_class_iterator : public std::iterator<std::forward_iterator_tag, class_range>
{
public:
    per_class_iterator() = default;
    per_class_iterator(const per_class_iterator&) = default;
    per_class_iterator& operator=(const per_class_iterator&) = default;

    explicit per_class_iterator(container &data) :
        data_(std::addressof(data)),
        class_(equal_range(data_->front())), //this would crash for an empty container. assume it's not.
        next_(class_.second)
    {
        assert(!data_->empty()); //a little late here
        assert(std::is_sorted(std::cbegin(*data_), std::cend(*data_)));
    }

    reference operator*()
    {
        //if data_ is unset the iterator is an end iterator. dereferencing end iterators is bad.
        assert(data_ != nullptr);
        return class_;
    }

    per_class_iterator& operator++()
    {
        assert(data_ != nullptr);

        //if we are at the end of our data
        if(next_ == data_->end())
        {
            //reset the data pointer, ie. make iterator an end iterator
            data_ = nullptr;
        }
        else
        {
            //set to the class of the next element
            class_ = equal_range(*next_);
            //and update the next_ iterator
            next_ = class_.second;
        }

        return *this;
    }

    per_class_iterator operator++(int)
    {
        per_class_iterator tmp{*this};
        ++(*this);
        return tmp;
    }

    bool operator!=(const per_class_iterator &rhs) const noexcept
    {
        return (data_ != rhs.data_) ||
            (data_ != nullptr && rhs.data_ != nullptr && next_ != rhs.next_);
    }

    bool operator==(const per_class_iterator &rhs) const noexcept
    {
        return !(*this != rhs);
    }

private:
    class_range equal_range(const item &i) const
    {
        return std::equal_range(data_->begin(), data_->end(), i);
    }

    container* data_ = nullptr;
    class_range class_;
    item_iterator next_;
};

per_class_iterator per_class_begin(container &c)
{
    return per_class_iterator{c};
}

per_class_iterator per_class_end()
{
    return per_class_iterator{};
}

int main()
{
    std::vector<item> items;
    items.push_back({1});
    items.push_back({1});
    items.push_back({3});
    items.push_back({3});
    items.push_back({3});
    items.push_back({5});
    //items are already sorted

//#define USE_PPL
#ifdef USE_PPL
    Concurrency::parallel_for_each(per_class_begin(items), per_class_end(),
#else
    std::for_each(per_class_begin(items), per_class_end(),
#endif
        [](class_range r)
        {
            //this loop *cannot* be parallelized trivially
            std::for_each(r.first, r.second,
                [](item &i)
                {
                    //update item (by evaluating all other items of the same class) ...
                    //building big temporary data structure for all items of same class ...
                    //i.processed = true;
                    std::cout << "item: " << i.class_id << '\n';
                });
        });

    return 0;
}
like image 958
user2460318 Avatar asked May 12 '16 16:05

user2460318


People also ask

How many types of iterators are there?

Input iterators are one of the five main types of iterators present in C++ Standard Library, others being Output iterators, Forward iterator, Bidirectional iterator and Random – access iterators.

What is the point of iterators?

The primary purpose of an iterator is to allow a user to process every element of a container while isolating the user from the internal structure of the container. This allows the container to store elements in any manner it wishes while allowing the user to treat it as if it were a simple sequence or list.


1 Answers

When you're writing a proxy iterator, the reference type should be a class type, precisely because it can outlive the iterator it is derived from. So, for a proxy iterator, when instantiating the std::iterator base should specify the Reference template parameter as a class type, typically the same as the value type:

class per_class_iterator : public std::iterator<
    std::forward_iterator_tag, class_range, std::ptrdiff_t, class_range*, class_range>
                                                                          ~~~~~~~~~~~

Unfortunately, PPL is not keen on proxy iterators and will break compilation:

ppl.h(2775): error C2338: lvalue required for forward iterator operator *
ppl.h(2772): note: while compiling class template member function 'Concurrency::_Parallel_for_each_helper<_Forward_iterator,_Function,1024>::_Parallel_for_each_helper(_Forward_iterator &,const _Forward_iterator &,const _Function &)'
        with
        [
            _Forward_iterator=per_class_iterator,
            _Function=main::<lambda_051d98a8248e9970abb917607d5bafc6>
        ]

This is actually a static_assert:

    static_assert(std::is_lvalue_reference<decltype(*_First)>::value, "lvalue required for forward iterator operator *");

This is because the enclosing class _Parallel_for_each_helper stores an array of pointers and expects to be able to indirect them later:

typename std::iterator_traits<_Forward_iterator>::pointer    _M_element[_Size];

Since PPL doesn't check that pointer is actually a pointer, we can exploit this by supplying a proxy pointer with an operator* and overloading class_range::operator&:

struct class_range_ptr;
struct class_range : std::pair<item_iterator, item_iterator> {
    using std::pair<item_iterator, item_iterator>::pair;
    class_range_ptr operator&();
};
struct class_range_ptr {
    class_range range;
    class_range& operator*() { return range; }
    class_range const& operator*() const { return range; }
};
inline class_range_ptr class_range::operator&() { return{*this}; }

class per_class_iterator : public std::iterator<
    std::forward_iterator_tag, class_range, std::ptrdiff_t, class_range_ptr, class_range&>
{
    // ...

This works great:

item: item: 5
1
item: 3item: 1

item: 3
item: 3
Press any key to continue . . .
like image 187
ecatmur Avatar answered Oct 05 '22 23:10

ecatmur