Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Fastest way to determine whether elements of a vector y occur in a vector x

Tags:

c++

I have the following problem: I have two vectors x and y of type double that are increasingly sorted and I would like to obtain a vector z indicating whether an element of y is present in x. Up to now, I have used std::binary_search in a for-loop as illustrated below, but I think there should be a faster way making use of the fact that also x is sorted? The issue is that this needs to be super fast as it turns out to be the bottleneck in my code.

For those familiar with R, I need an equivalent to match(y, x, nomatch = 0L) > 0L.

#include <iostream>
#include <algorithm>
#include <vector>

int main() {

    using namespace std;

    vector<double> x = {1.8, 2.4, 3.3, 4.2, 5.6,7.9, 8.5, 9.3};
    vector<double> y = {0.5, 0.98, 1.8, 3.1, 5.6, 6.6, 9.3, 9.3, 9.5};

    vector<bool> z(y.size());
    for (int i = 0; i != y.size(); ++i)
        z[i] = binary_search(x.begin(), x.end(), y[i]);

    for (vector<bool>::const_iterator i = z.begin(); i != z.end(); ++i)
        cout << *i << " ";

    return 0;
}

EDIT

Here are representative sample data for my problem:

#include <iostream>
#include <algorithm>
#include <vector>
#include <cstdlib>
#include <ctime>

// function generator:
double RandomNumber () { return (std::rand() / 10e+7); }

int main() {

    using namespace std;
    std::srand ( unsigned ( std::time(0) ) );

    // 5000 is representative
    int n = 5000;

    std::vector<double> x (n);
    std::generate (x.begin(), x.end(), RandomNumber);

    std::vector<double> y (n);
    std::generate (y.begin(), y.end(), RandomNumber);

    for(std::vector<double>::const_iterator i = x.begin(); i != x.end(); i++) {
    y.push_back(*i);
}

    std::sort(x.begin(), x.end());
    std::sort(y.begin(), y.end());

    return 0;
}
like image 773
NoBackingDown Avatar asked Dec 03 '22 15:12

NoBackingDown


1 Answers

You can use std::set_itersection:

#include <vector>
#include <algorithm>
#include <iterator>
#include <iostream>

int main()
{
    std::vector<double> x {1.8, 2.4, 3.3, 4.2, 5.6,7.9, 8.5, 9.3};
    std::vector<double> y {0.5, 0.98, 1.8, 3.1, 5.6, 6.6, 9.3, 9.3, 9.5};

    std::vector<double> z {};

    std::set_intersection(std::cbegin(x), std::cend(x), 
                          std::cbegin(y), std::cend(y), 
                          std::back_inserter(z));

   std::copy(std::cbegin(z), std::cend(z),
             std::ostream_iterator<double> {std::cout, " "});
}

Edit

To address Dieter Lücking point in the comments, here is a version that more closely matches R's match function:

#include <vector>
#include <deque>
#include <algorithm>
#include <iterator>
#include <functional>
#include <memory>
#include <iostream>

template <typename T>
std::deque<bool> match(const std::vector<T>& y, const std::vector<T>& x)
{
    std::vector<std::reference_wrapper<const T>> z {};
    z.reserve(std::min(y.size(), x.size()));

    std::set_intersection(std::cbegin(y), std::cend(y),
                          std::cbegin(x), std::cend(x),
                          std::back_inserter(z));

    std::deque<bool> result(y.size(), false);

    for (const auto& e : z) {
        result[std::distance(std::addressof(y.front()), std::addressof(e.get()))] = true;
    }

    return result;
}

int main()
{
    std::vector<double> x {1.8, 2.4, 3.3, 4.2, 5.6,7.9, 8.5, 9.3};
    std::vector<double> y {0.5, 0.98, 1.8, 3.1, 5.6, 6.6, 9.3, 9.3, 9.5};

    const auto matches = match(y, x);

    std::copy(std::cbegin(matches), std::cend(matches),
              std::ostream_iterator<bool> {std::cout});
}
like image 127
Daniel Avatar answered Dec 29 '22 00:12

Daniel