Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

The fastest way to find union of sets

I have sets of pairs of int like set<pair<int,int> > x1, x2, ... xn ( n can be between 2 and 20). What is the fastest way to find union of those sets ?

Sorry If I wasn't make clear at the beginning, I meant fast in performance, memory allocation is not a problem.

like image 843
Damir Avatar asked Jul 06 '12 12:07

Damir


People also ask

How do you find the union of sets?

To find the union of two sets, we take X and Y, which contains all the elements of X and all the elements of Y such that no element is repeated. The symbol for representing the union of sets is '∪'.

Where do I find AUB in sets?

Let A and B be the two sets. The union of A and B is the set of all those elements which belong either to A or to B or both A and B. Now we will use the notation A U B (which is read as 'A union B') to denote the union of set A and set B. Thus, A U B = {x : x ∈ A or x ∈ B}.

How do you find the union in a Venn diagram?

Venn diagrams consist of a series of overlapping circles, each circle representing a category. To represent the union of two sets, we use the ∪ symbol — not to be confused with the letter 'u. '


2 Answers

Assuming that the result needs to be a set too, then you have no choice but to insert every element of each x_i into that result set. So the obvious implementation is:

set<pair<int,int>> x(x1);
x.insert(x2.begin(), x2.end());
// etc

The remaining question is whether this can be beaten for speed.

The single-element insert takes a position hint, which if correct speeds up insertion. So it might turn out that something like this is faster than x.insert(x2.begin(), x2.end());:

auto pos = x.begin()
for (auto it = x2.begin(); it != x2.end(); ++it) {
    pos = x.insert(pos, *it);
}

It depends on the data, though: that position may or may not be accurate. You can ensure that it is by putting all the elements in order before you start, for which the best tool is probably set_union. That might better be named merge_and_dedupe_sorted_ranges, because what it does has nothing particularly to do with std::set. You could either set_union into intermediate vectors, or else into sets like this:

set<pair<int,int>> x;
set_union(x1.begin(), x1.end(), x2.begin(), x2.end(), inserter(x, x.end());

My concern with using set_union is that in order to get the benefit of adding the elements to a set in increasing order, you need to create a new empty container each time you call it (because if it's not empty then the elements added need to interleave with the values already in it). The overhead of these containers might be higher than the overhead of inserting into a set in arbitrary order: you would have to test it.

like image 200
Steve Jessop Avatar answered Sep 18 '22 09:09

Steve Jessop


Unfortunately, I believe that you are limited to a linear O(N) solution, as all a union would be is a combination of the elements in both sets.

template<typename S>
S union_sets(const S& s1, const S& s2)
{
     S result = s1;

     result.insert(s2.cbegin(), s2.cend());

     return result;
}
like image 34
Richard J. Ross III Avatar answered Sep 21 '22 09:09

Richard J. Ross III