I need advice for micro optimization in C++ for a vector comparison function, it compares two vectors for equality and order of elements does not matter.
template <class T>
static bool compareVectors(const vector<T> &a, const vector<T> &b)
{
int n = a.size();
std::vector<bool> free(n, true);
for (int i = 0; i < n; i++) {
bool matchFound = false;
for (int j = 0; j < n; j++) {
if (free[j] && a[i] == b[j]) {
matchFound = true;
free[j] = false;
break;
}
}
if (!matchFound) return false;
}
return true;
}
This function is used heavily and I am thinking of possible way to optimize it. Can you please give me some suggestions? By the way I use C++11.
Thanks
A vector quantity has two characteristics, a magnitude and a direction. When comparing two vector quantities of the same type, you have to compare both the magnitude and the direction. On this slide we show three examples in which two vectors are being compared. Vectors are usually denoted on figures by an arrow.
A std::vector can never be faster than an array, as it has (a pointer to the first element of) an array as one of its data members. But the difference in run-time speed is slim and absent in any non-trivial program. One reason for this myth to persist, are examples that compare raw arrays with mis-used std::vectors.
Description. The C++ function std::vector::operator== tests whether two vectors are equal or not. Operator == first checks the size of both container, if sizes are same then it compares elements sequentially and comparison stops at first mismatch.
Vector is faster for insertion and deletion of elements at the end of the container. Set is faster for insertion and deletion of elements at the middle of the container.
It just realized that this code only does kind of a "set equivalency" check (and now I see that you actually did say that, what a lousy reader I am!). This can be achieved much simpler
template <class T>
static bool compareVectors(vector<T> a, vector<T> b)
{
std::sort(a.begin(), a.end());
std::sort(b.begin(), b.end());
return (a == b);
}
You'll need to include the header algorithm
.
If your vectors are always of same size, you may want to add an assertion at the beginning of the method:
assert(a.size() == b.size());
This will be handy in debugging your program if you once perform this operation for unequal lengths by mistake.
Otherwise, the vectors can't be the same if they have unequal length, so just add
if ( a.size() != b.size() )
{
return false;
}
before the sort instructions. This will save you lots of time.
The complexity of this technically is O(n*log(n))
because it's mainly dependent on the sorting which (usually) is of that complexity. This is better than your O(n^2)
approach, but might be worse due to the needed copies. This is irrelevant if your original vectors may be sorted.
If you want to stick with your approach, but tweak it, here are my thoughts on this:
You can use std::find
for this:
template <class T>
static bool compareVectors(const vector<T> &a, const vector<T> &b)
{
const size_t n = a.size(); // make it const and unsigned!
std::vector<bool> free(n, true);
for ( size_t i = 0; i < n; ++i )
{
bool matchFound = false;
auto start = b.cbegin();
while ( true )
{
const auto position = std::find(start, b.cend(), a[i]);
if ( position == b.cend() )
{
break; // nothing found
}
const auto index = position - b.cbegin();
if ( free[index] )
{
// free pair found
free[index] = false;
matchFound = true;
break;
}
else
{
start = position + 1; // search in the rest
}
}
if ( !matchFound )
{
return false;
}
}
return true;
}
Another possibility is replacing the structure to store free positions. You may try a std::bitset
or just store the used indices in a vector and check if a match isn't in that index-vector. If the outcome of this function is very often the same (so either mostly true or mostly false) you can optimize your data structures to reflect that. E.g. I'd use the list of used indices if the outcome is usually false since only a handful of indices might needed to be stored.
This method has the same complexity as your approach. Using std::find to search for things is sometimes better than a manual search. (E.g. if the data is sorted and the compiler knows about it, this can be a binary search).
Your can probabilistically compare two unsorted vectors (u,v) in O(n):
Calculate:
U= xor(h(u[0]), h(u[1]), ..., h(u[n-1]))
V= xor(h(v[0]), h(v[1]), ..., h(v[n-1]))
If U==V then the vectors are probably equal.
h(x) is any non-cryptographic hash function - such as MurmurHash. (Cryptographic functions would work as well but would usually be slower).
(This would work even without hashing, but it would be much less robust when the values have a relatively small range).
A 128-bit hash function would be good enough for many practical applications.
I am noticing that most proposed solution involved sorting booth of the input vectors.I think sorting the arrays compute more that what is strictly necessary for the evaluation the equality of the two vector ( and if the inputs vectors are constant, a copy needs to be made). One other way would be to build an associative container to count the element in each vector... It's also possible to do the reduction of the two vector in parrallel.In the case of very large vector that could give a nice speed up.
template <typename T> bool compareVector(const std::vector<T> & vec1, const std::vector<T> & vec2) {
if (vec1.size() != vec2.size())
return false ;
//Here we assuame that T is hashable ...
auto count_set = std::unordered_map<T,int>();
//We count the element in each vector...
for (unsigned int count = 0 ; count < vec1.size();++count)
{
count_set[vec1[count]]++;
count_set[vec2[count]]--;
} ;
// If everything balance out we should have zero everywhere
return std::all_of(count_set.begin(),count_set.end(),[](const std::pair<T,int> p) { return p.second == 0 ;});
}
That way depend on the performance of your hashsing function , we might get linear complexity in the the length of booth vector (vs n*logn with the sorting). NB the code might have some bug , did have time to check it ...
Benchmarking this way of comparing two vector to sort based comparison i get on ubuntu 13.10,vmware core i7 gen 3 :
Comparing 200 vectors of 500 elements by counting takes 0.184113 seconds
Comparing 200 vectors of 500 elements by sorting takes 0.276409 seconds
Comparing 200 vectors of 1000 elements by counting takes 0.359848 seconds
Comparing 200 vectors of 1000 elements by sorting takes 0.559436 seconds
Comparing 200 vectors of 5000 elements by counting takes 1.78584 seconds
Comparing 200 vectors of 5000 elements by sorting takes 2.97983 seconds
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With