Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Best way to merge multiple STL containers, removing duplicate elements?

Tags:

c++

stl

I have two STL containers that I want to merge, removing any elements that appear more than once. For example:

typedef std::list<int> container;
container c1;
container c2;

c1.push_back(1);
c1.push_back(2);
c1.push_back(3);

c2.push_back(2);
c2.push_back(3);
c2.push_back(4);

container c3 = unique_merge(c1, c2);
// c3 now contains the following 4 elements:
//   1, 2, 3, 4

std::unique seems to be for adjacent elements only, and in my case the containers could be in any order. I could do some std::set trickery I guess:

container unique_merge(const container& c1, const container& c2)
{
    std::set<container::value_type> s;
    BOOST_FOREACH(const container::value_type& val, c1)
        s.insert(val);
    BOOST_FOREACH(const container::value_type& val, c2)
        s.insert(val);
    return container(s.begin(), s.end());
}

Is there a better way or have I missed something bleeding obvious?

like image 592
Rob Avatar asked Nov 11 '08 15:11

Rob


2 Answers

You are going to need to either sort (either explicitly, or implicitly via a sorted container like set).

There is a common idiom using std::sort/std::unique/std::erase to get unique elements in a container.

So create a container with the contents of c1, append the contents of c2, then sort, move unique elements to the end, and erase them. Something like this:

container c(c1.begin(), c1.end());
c.insert(c.end(), c2.begin(), c2.end());
c.erase(std::unique(c.begin(), c.end()), c.end());
like image 136
Chris Morley Avatar answered Sep 29 '22 11:09

Chris Morley


For an unordered lists, your set trick is probably one of the best. It each insert should be O(log n), with N inserts required, and traversing will be O(n), giving you O(N*log n). The other option is to run std::sort on each list individually and then walk through them in parallel using std::set_union, which removes duplicates for you. This will also be O(n*log n), so if you're worried about performance, you'll have to profile. If you're not, do whichever makes more sense to you.

Edit: set_union will only work if there are no duplicates in the original lists, otherwise you'll have to go with sort, merge, unique and erase. The big O performance is still the same, with the same caveats about profiling.

template <typename container>
container unique_merge(container c1, container c2)
{
    std::sort(c1.begin(), c1.end());
    std::sort(c2.begin(), c2.end());
    container mergeTarget;
    std::merge(c1.begin(), c1.end(), c2.begin(), c2.end(), 
        std::insert_iterator(mergeTarget, mergeTarget.end())
    );
    std::erase(
        std::unique(mergeTarget.begin(), mergeTarget.end()), 
        mergeTarget.end()
    );

    return mergeTarget;
}
like image 43
Eclipse Avatar answered Sep 29 '22 12:09

Eclipse