Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to efficiently compare vectors with C++?

Tags:

c++

c++11

I need advice for micro optimization in C++ for a vector comparison function, it compares two vectors for equality and order of elements does not matter.

template <class T>
static bool compareVectors(const vector<T> &a, const vector<T> &b)
{
  int n = a.size();
  std::vector<bool> free(n, true);
  for (int i = 0; i < n; i++) {
    bool matchFound = false;
    for (int j = 0; j < n; j++) {
      if (free[j] && a[i] == b[j]) {
        matchFound = true;
        free[j] = false;
        break;
      }
    }
    if (!matchFound) return false;
  }
  return true;
}

This function is used heavily and I am thinking of possible way to optimize it. Can you please give me some suggestions? By the way I use C++11.

Thanks

like image 601
user2381422 Avatar asked Jun 30 '13 19:06

user2381422


People also ask

How do you compare two vectors?

A vector quantity has two characteristics, a magnitude and a direction. When comparing two vector quantities of the same type, you have to compare both the magnitude and the direction. On this slide we show three examples in which two vectors are being compared. Vectors are usually denoted on figures by an arrow.

Are C style arrays faster than vectors?

A std::vector can never be faster than an array, as it has (a pointer to the first element of) an array as one of its data members. But the difference in run-time speed is slim and absent in any non-trivial program. One reason for this myth to persist, are examples that compare raw arrays with mis-used std::vectors.

What does the equality operator do with two vectors?

Description. The C++ function std::vector::operator== tests whether two vectors are equal or not. Operator == first checks the size of both container, if sizes are same then it compares elements sequentially and comparison stops at first mismatch.

Is set or vector faster C++?

Vector is faster for insertion and deletion of elements at the end of the container. Set is faster for insertion and deletion of elements at the middle of the container.


3 Answers

It just realized that this code only does kind of a "set equivalency" check (and now I see that you actually did say that, what a lousy reader I am!). This can be achieved much simpler

template <class T>
static bool compareVectors(vector<T> a, vector<T> b)
{
    std::sort(a.begin(), a.end());
    std::sort(b.begin(), b.end());
    return (a == b);
}

You'll need to include the header algorithm.

If your vectors are always of same size, you may want to add an assertion at the beginning of the method:

assert(a.size() == b.size());

This will be handy in debugging your program if you once perform this operation for unequal lengths by mistake.

Otherwise, the vectors can't be the same if they have unequal length, so just add

if ( a.size() != b.size() )
{
   return false;
}

before the sort instructions. This will save you lots of time.

The complexity of this technically is O(n*log(n)) because it's mainly dependent on the sorting which (usually) is of that complexity. This is better than your O(n^2) approach, but might be worse due to the needed copies. This is irrelevant if your original vectors may be sorted.


If you want to stick with your approach, but tweak it, here are my thoughts on this:

You can use std::find for this:

template <class T>
static bool compareVectors(const vector<T> &a, const vector<T> &b)
{
  const size_t n = a.size(); // make it const and unsigned!
  std::vector<bool> free(n, true);
  for ( size_t i = 0; i < n; ++i )
  {
      bool matchFound = false;
      auto start = b.cbegin();
      while ( true )
      {
          const auto position = std::find(start, b.cend(), a[i]);
          if ( position == b.cend() )
          {
              break; // nothing found
          }
          const auto index = position - b.cbegin();
          if ( free[index] )
          {
             // free pair found
             free[index] = false;
             matchFound = true;
             break;
          }
          else
          {
             start = position + 1; // search in the rest
          }
      }
      if ( !matchFound )
      {
         return false;
      }
   }
   return true;
}

Another possibility is replacing the structure to store free positions. You may try a std::bitset or just store the used indices in a vector and check if a match isn't in that index-vector. If the outcome of this function is very often the same (so either mostly true or mostly false) you can optimize your data structures to reflect that. E.g. I'd use the list of used indices if the outcome is usually false since only a handful of indices might needed to be stored.

This method has the same complexity as your approach. Using std::find to search for things is sometimes better than a manual search. (E.g. if the data is sorted and the compiler knows about it, this can be a binary search).

like image 64
stefan Avatar answered Oct 18 '22 02:10

stefan


Your can probabilistically compare two unsorted vectors (u,v) in O(n):

Calculate:

U= xor(h(u[0]), h(u[1]), ..., h(u[n-1]))
V= xor(h(v[0]), h(v[1]), ..., h(v[n-1]))

If U==V then the vectors are probably equal.

h(x) is any non-cryptographic hash function - such as MurmurHash. (Cryptographic functions would work as well but would usually be slower).

(This would work even without hashing, but it would be much less robust when the values have a relatively small range).

A 128-bit hash function would be good enough for many practical applications.

like image 14
Lior Kogan Avatar answered Oct 18 '22 04:10

Lior Kogan


I am noticing that most proposed solution involved sorting booth of the input vectors.I think sorting the arrays compute more that what is strictly necessary for the evaluation the equality of the two vector ( and if the inputs vectors are constant, a copy needs to be made). One other way would be to build an associative container to count the element in each vector... It's also possible to do the reduction of the two vector in parrallel.In the case of very large vector that could give a nice speed up.

template <typename T>  bool compareVector(const std::vector<T> &  vec1, const std::vector<T> & vec2) {
    if (vec1.size() != vec2.size())
        return false ;

    //Here we assuame that T is hashable ...
    auto count_set =  std::unordered_map<T,int>();

    //We count the element in each vector...
    for (unsigned int count = 0 ; count <  vec1.size();++count)
    {
        count_set[vec1[count]]++;
        count_set[vec2[count]]--;
    } ;

    // If everything balance out we should have zero everywhere
    return std::all_of(count_set.begin(),count_set.end(),[](const std::pair<T,int> p) { return p.second == 0 ;});

}

That way depend on the performance of your hashsing function , we might get linear complexity in the the length of booth vector (vs n*logn with the sorting). NB the code might have some bug , did have time to check it ...

Benchmarking this way of comparing two vector to sort based comparison i get on ubuntu 13.10,vmware core i7 gen 3 :

Comparing 200 vectors of 500 elements by counting takes 0.184113 seconds

Comparing 200 vectors of 500 elements by sorting takes 0.276409 seconds

Comparing 200 vectors of 1000 elements by counting takes 0.359848 seconds

Comparing 200 vectors of 1000 elements by sorting takes 0.559436 seconds

Comparing 200 vectors of 5000 elements by counting takes 1.78584 seconds

Comparing 200 vectors of 5000 elements by sorting takes 2.97983 seconds

like image 7
GreyGeek Avatar answered Oct 18 '22 04:10

GreyGeek