Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

how to use C++ STL and boost to tell if two sorted vectors intersect

Tags:

c++

set

boost

I have two sorted C++ std::vector without duplicates (you could call them sets) and I want to know if they intersect. I do not need the vector of common elements.

I wrote the code at the end of this question using the boost::set_intersection algorithm in the boost "range" library (http://www.boost.org/doc/libs/1_50_0/libs/range/doc/html/range/reference/algorithms/set.html). This code avoids constructing the set of common elements but does scan all the elements of the vectors.

Is it possible to improve my function "intersects" using boost and the C++ STL without using a loop? I'd like to stop at the first common element in the vectors or at the very least avoid my counter class.

The boost range library provides "includes" and "set_intersection" but not "intersects". This makes me think that "intersects" is trivial or provided elsewhere but I cannot find it.

thanks!

#include <vector>
#include <string>
#include <boost/assign/list_of.hpp>
#include <boost/function_output_iterator.hpp>
#include <boost/range/algorithm.hpp>
#include <boost/range/algorithm_ext/erase.hpp>

template<typename T>
class counter
{
    size_t * _n;
public:
    counter(size_t * b) : _n(b) {}
    void operator()(const T & x) const
    {
        ++*_n;
    }
};

bool intersects(const std::vector<std::string> & a, const std::vector<std::string> & b)
{
    size_t found = 0;
    boost::set_intersection(a, b, boost::make_function_output_iterator(counter<std::string>(&found)));
    return found;
}

int main(int argc, char ** argv)
{
    namespace ba = boost::assign;
    using namespace std;
    vector<string> a = ba::list_of(string("b"))(string("vv"))(string("h"));
    vector<string> b = ba::list_of(string("z"))(string("h"))(string("aa"));
    boost::erase(a, boost::unique<boost::return_found_end>(boost::sort(a)));
    boost::erase(b, boost::unique<boost::return_found_end>(boost::sort(b)));
    cout << "does " << (intersects(a, b) ? "" : "not ") << "intersect\n";
    return 0;
}
like image 553
Stuart Pook Avatar asked Oct 07 '22 07:10

Stuart Pook


1 Answers

Firstly to answer the comment, boost's set_intersection takes ranges as parameters compared to the STL one which takes iterators.

Other than that, there is no real difference with regards to the algorithm and the complexity.

As far as I know there is no ready-made library function to do what you want to do, which is just test if two sequences are unique and stop immediately if they are not.

You also have to realise that you will always have "worst case scenario" when they really are unique.

The complexity is O(N+M) although you can also use binary-search on one of the collections which will make it O(N log M) or O(M log N) and if one is a lot larger than the other this can be a big saving. (e.g. N=1000000, M=20, M log N is only approx 400)

You can "reduce" by taking the median of one, find it in the other and compare the sub-ranges in separate threads.

There is also the "horrible" solution of having your functor that gets called on an intersection throw, thus breaking you out of the loop. (Yes there is one there, even if it's hidden in an algorithm). We can probably write our own though that is O(N+M) very simply. I will do it with 4 iterators:

 template< typename Iter1, typename Iter2 >
 bool intersects( Iter1 iter1, Iter1 iter1End, Iter2 iter2, Iter2 iter2End )
 {
      while( iter1 != iter1End && iter2 != iter2End )
      {
         if( *iter1 < *iter2 )
         {
             ++iter1;
         }
         else if ( *iter2 < *iter1 )
         {
             ++iter2;
         }
         else
             return true;
      }
      return false;
 }

 // Predicate version where the compare version returns <0 >0 or 0

 template< typename Iter1, typename Iter2, typename Comp >
 bool intersects( Iter1 iter1, Iter1 iter1End, Iter2 iter2, Iter2 iter2End, Comp comp )
 {
      while( iter1 != iter1End && iter2 != iter2End )
      {
         int res = comp( *iter1, *iter2 );
         if( res < 0 )
         {
             ++iter1;
         }
         else if ( res > 0 )
         {
             ++iter2;
         }
         else
             return true;
      }
      return false;
 }
like image 199
CashCow Avatar answered Oct 08 '22 22:10

CashCow