unordered set intersection in C++

Question

Here is my code, wondering any ideas to make it faster? My implementation is brute force, which is for any elements in a, try to find if it also in b, if so, put in result set c. Any smarter ideas is appreciated.

#include <iostream>
#include <unordered_set>

int main() {
    std::unordered_set<int> a = {1,2,3,4,5};
    std::unordered_set<int> b = {3,4,5,6,7};
    std::unordered_set<int> c;
    for (auto i = a.begin(); i != a.end(); i++) {
        if (b.find(*i) != b.end()) c.insert(*i);
    }
    for (int v : c) {
        std::printf("%d 
", v);
    }
}

Angew is no longer proud of SO · Accepted Answer

Asymptotically, your algorithm is as good as it can get.

In practice, I'd add a check to loop over the smaller of the two sets and do lookups in the larger one. Assuming reasonably evenly distributed hashes, a lookup in a std::unoredered_set takes constant time. So this way, you'll be performing fewer such lookups.

stephane k. · Answer

You can do it with std::copy_if()

std::copy_if(a.begin(), a.end(), std::inserter(c, c.begin()), [b](const int element){return b.count(element) > 0;} );

doron · Answer

Your algorithm is as good as it gets for a unordered set. however if you use a std::set (which uses a binary tree as storage) or even better a sorted std::vector, you can do better. The algorithm should be something like:

get iterators to a.begin() and b.begin()
if the iterators point to equal element add to intersection and increment both iterators.
Otherwise increment the iterator pointing to the smallest value
Go to 2.

Both should be O(n) time but using a normal set should save you from calculating hashes or any performance degradation that arises from hash collisions.

Aconcagua · Answer

Thanks Angew, why your method is faster? Could you elaborate a bit more?

Well, let me provide you some additional info...

It should be pretty clear that, whichever data structures you use, you will have to iterate over all elements in at least one of those, so you cannot get better than O(n), n being the number of elements in the data structure selected to iterate over. Elementary now is, how fast you can look up the elements in the other structure – with a hash set, which std::unordered_set actually is, this is O(1) – at least if the number of collisions is small enough ("reasonably evenly distributed hashes"); the degenerate case would be all values having the same key...

So far, you get O(n) * O(1) = O(n). But you still the choice: O(n) or O(m), if m is the number of elements in the other set. OK, in complexity calculations, this is the same, we have a linear algorithm anyway, in practice, though, you can spare some hash calculations and look-ups if you choose the set with the lower number of elements...

unordered set intersection in C++

Tags:

c++

stl

unordered-set

Lin Ma

4 Answers

Angew is no longer proud of SO

stephane k.

doron

Aconcagua

Recent Activity

Donate For Us

unordered set intersection in C++

Tags:

c++

stl

unordered-set

Lin Ma

4 Answers

Angew is no longer proud of SO

stephane k.

doron

Aconcagua

Related questions

Recent Activity

Donate For Us