I find myself needing to return the size of the intersection of two vectors:
std::vector<int> A_, B_
I do not require the intersected values, just the size of the set. This function needs to be called a very large number of times. This is part of a much bigger simulation over a (mathematical) graph/network.
My working conditions are:
My first attempt, using a naive loop, is below. But I think this may not be enough. I've assumed...that std::set_intersection will be too onerous due to repeated sorts and allocations.
int vec_intersect(const std::vector<int>& A_, const std::vector<int>& B_) {
int c_count=0;
for(std::vector<int>::const_iterator it = A_.begin(); it != A_.end(); ++it){
for(std::vector<int>::const_iterator itb = B_.begin(); itb != B_.end(); ++itb){
if(*it==*itb) ++c_count;
}
}
return c_count;
}
Given my conditions above, how else can I implement this to gain speed, relatively easily? Should I be thinking about hash tables or going with sorts and STL, or different containers?
Your algorithm is O(n2) in the number of elements (assuming that the size of both vectors is approximately equal to n
). Here is an O(n) algorithm:
std::unordered_set<int>
A
into the setB
, checking that they are present in the unordered_set
, and incrementing the count for each item that is present.Here is an implementation in C++11, using a lambda for brevity:
vector<int> a {2, 3, 5, 7, 11, 13};
vector<int> b {1, 3, 5, 7, 9, 11};
unordered_set<int> s(a.begin(), a.end());
int res = count_if(b.begin(), b.end(), [&](int k) {return s.find(k) != s.end();});
// Lambda above captures the set by reference. count_if passes each element of b
// to the lambda. The lambda returns true if there is a match, and false otherwise.
(this prints 4
; demo)
Your algorithm is O(n*m), where n and m are the number of elements in the vectors.
If you don't have issues where the input data is untrusted, you'll probably have the best results with:
A
into an unordered_set
B
, if it is in the set, increment your counter.For example:
int vec_intersect(const std::vector<int>& A_, const std::vector<int>& B_)
{
std::unordered_set<int> aSet(A_.cbegin(), A_.cend());
return std::count_if(B_.cbegin(), B_.cend(), [&](int element) {
return aSet.find(element) != aSet.end();
});
}
This will probabilistically give O(m + n) results. (Hash tables are almost always O(1), but if an attacker can force many collisions in the table they could force O(n) behavior, leading to denial of service)
If you require deterministic results, and the order of the vectors does not matter, sorting one vector will work, which is only O(m lg m + m + n). That is:
For example:
int vec_intersect(std::vector<int>& A_, const std::vector<int>& B_)
{
std::sort(A_.begin(), A_.end());
return std::count_if(B_.cbegin(), B_.cend(), [&](int element) {
return std::binary_search(A_.cbegin(), A_.cend(), element);
});
}
Just for giggles, here's an <algorithm>
-ized version of your algorithm:
int vec_intersect(const std::vector<int>& A_, const std::vector<int>& B_)
{
return std::count_if(B_.cbegin(), B_.cend(), [&](int element) {
return std::find(A_.cbegin(), A_.cend(), element) != A_.cend();
});
}
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With