Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

A reduce function (for many set unions) in C++

Tags:

c++

set

stl

reduce

What I am trying to do: I have a simple set union function in C++ using STL, and I'm trying to wrap it in a function that will let me perform the union of arbitrarily many sets contained in STL data structures (e.g. std::list, std::vector, std::forward_list, ...).

How I tried to do it: To start, my simple set union:

#include <algorithm>
template <typename set_type>
set_type sunion(const set_type & lhs, const set_type & rhs)
{
  set_type result;
  std::set_union( lhs.begin(), lhs.end(), rhs.begin(), rhs.end(), std::inserter(result, result.end()) );
  return result;
}

where set_type defines some STL std::set<T>, e.g. std::set<int>.

After noticing several times that I end up needing to perform several unions on iterators of sets (in Python this would be a reduce of my sunion function over some iterable object of set_types). For instance, I might have

std::vector<std::set<int> > all_sets;

or

std::list<std::set<int> > all_sets;

etc., and I want to get the union of all sets in all_sets. I am trying to implement a simple reduce for this, which essentially does a (faster, more elegant, non-copying) version of:

sunion(... sunion( sunion( all_sets.begin(), all_sets.begin()+1 ), all_sets.begin()+2 ) , ... )

Essentially, to do this quickly, I just want to declare a set_type result and then iterate through all_sets and insert value in every set in all_sets into the result object:

template <typename set_type>
set_type sunion_over_iterator_range(const std::iterator<std::forward_iterator_tag, set_type> & begin, const std::iterator<std::forward_iterator_tag, set_type> & end)
{
  set_type result;
  for (std::iterator<std::forward_iterator_tag, set_type> iter = begin; iter != end; iter++)
    {
      insert_all(result, *iter);
    }
  return result;
}

where insert_all is defined:

// |= operator; faster than making a copy and performing union
template <typename set_type>
void insert_all(set_type & lhs, const set_type & rhs)
{
  for (typename set_type::iterator iter = rhs.begin(); iter != rhs.end(); iter++)
    {
      lhs.insert(*iter);
    }
}

How it didn't work: Unfortunately, my sunion_over_iterator_range(...) doesn't work with arguments std::vector<set_type>::begin(), std::vector<set_type>::end(), which are of type std::vector<set_type>::iterator. I thought std::vector<T>::iterator returns an iterator<random_access_iterator_tag, T>. A

After compilation failed because of type incompatibility of the iterators, I looked at the stl vector source (located in /usr/include/c++/4.6/bits/stl_vector.h for g++ 4.6 & Ubuntu 11.10), and was surprised to see the typedef for vector<T>::iterator to be typedef __gnu_cxx::__normal_iterator<pointer, vector> iterator;. I had thought that a ForwardIterator was a subtype of RandomAccessIterator, and so should be fine, but clearly I was incorrect, or I would not be here.

How I am grateful and ashamed of inciting your frustration due to my inexperience: Apologies if I'm showing my ignorance-- I am trying to learn to be a better object oriented programmer (in the past I have simply hacked everything out in C-style code).

I'm doing my best, coach! Please help me out and spare the world from bad code that I would produce without your code ninja insight...

like image 513
user Avatar asked Mar 27 '12 21:03

user


2 Answers

Here's a very naive approach:

std::set<T> result;
std::vector<std::set<T>> all_sets;

for (std::set<T> & s : all_sets)
{
    result.insert(std::make_move_iterator(s.begin()),
                  std::make_move_iterator(s.end()));
}

This invalidates the elements in the source sets, though it doesn't actually move the element nodes over. If you want to leave the source sets intact, just remove the make_move_iterator.

Unfortunately there's no interface for std::set that lets you "splice" two sets in a way that doesn't reallocate the internal tree nodes, so this is more or less as good as you can get.


Here's a variadic template approach:

template <typename RSet> void union(RSet &) { }

template <typename RSet, typename ASet, typename ...Rest>
void union(RSet & result, ASet const & a, Rest const &... r)
{
    a.insert(a.begin(), a.end());
    union(result, r...);
}

Usage:

std::set<T> result
union(result, s1, s2, s3, s4);

(Similar move-optimizations are feasible here; you can even add some branching that will copy from immutables but move from mutables, or from rvalues only, if you like.)


Here's a version using std::accumulate:

std::set<T> result =
   std::accumulate(all_sets.begin(), all_sets.end(), std::set<T>(),
                   [](std::set<T> & s, std::set<T> const & t)
                     { s.insert(t.begin(), t.end()); return s; }    );

This version seems to rely on return value optimisation a lot, though, so you might like to compare it to this hacked up and rather ugly version:

std::set<T> result;
std::accumulate(all_sets.begin(), all_sets.end(), 0,
                [&result](int, std::set<T> const & t)
                { result.insert(t.begin(), t.end()); return 0; } );
like image 200
Kerrek SB Avatar answered Oct 13 '22 00:10

Kerrek SB


Usually, when using iterators we don't care about the actual category. Just let the implementation sort it out. That means, just change the function to accept any type:

template <typename T>
typename std::iterator_traits<T>::value_type sunion_over_iterator_range(T begin, T end)
{
   typename std::iterator_traits<T>::value_type result;
   for (T iter = begin; iter != end; ++ iter)
   {
      insert_all(result, *iter);
   }
   return result;
}

Note that I have used typename std::iterator_traits<T>::value_type, which is the type of *iter.

BTW, the iterator pattern is not related to OOP. (That doesn't mean it's a bad thing).

like image 43
kennytm Avatar answered Oct 13 '22 00:10

kennytm