Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Binary search in std::vector

I am trying to look for the position of vector elements into another vector. Here i am interested to use an implementation as fast as binary search. I have different vectors of length 1 million or more, so i am trying to achieve something faster.

Following situations in my case:

1) vector in which i am searching is sorted.

2) The element i am searching for will always be there i.e i don't have a case of not found, and i would like to get the index of vector elements in a faster way.

I tried the following code to get the index of vector elements.

#include <iostream>
#include <vector>
#include <algorithm>

template<class Iter, class T>
Iter binary_find(Iter begin, Iter end, T val)
{
    Iter i = std::lower_bound(begin, end, val);
    return i;
}

int main() {
    std::vector<std::string> values = {"AAAAAA","AB", "AD" ,"BCD","CD", "DD" };
    std::vector<std::string> tests = {"AB", "CD","AD", "DD"};
    for(int i=0 ; i < tests.size(); i++) {
        int pos = binary_find(values.begin(), values.end(), tests.at(i))- values.begin();
    std::cout << tests.at(i) << " found at: " << pos <<std::endl;
    }
    return 0;
}  

I would like to know if the code matches with the binary search implementation.??

Is there a faster way to get the index of vector elements?

Any further suggestions to improve this code.

like image 443
Agaz Wani Avatar asked May 12 '16 10:05

Agaz Wani


3 Answers

binary_find doesn't return anything despite not declared to return void, so it has undefined behaviour.

After it is fixed, and assuming that you have no specific knowledge about the contents of the vector other than it is sorted, binary search is pretty much optimal.

There are however, other data structures that are faster for predicate based lookup than a vector. If performance is critical, you should take a look at search trees and hash maps. Since your keys are strings, tries and directed acyclic word graph in particular may be efficient. You may want to measure which is best for your use case.

like image 176
eerorika Avatar answered Oct 01 '22 02:10

eerorika


http://www.cpluplus.com says that the behavior of binary_search is equivalent to:

template <class ForwardIterator, class T>
bool binary_search (ForwardIterator first, ForwardIterator last, const T& val) {
    first = std::lower_bound(first, last, val);
    return (first != last && !(val < *first));
}

So yes, lower_bound is your weapon of choice. But when you take the difference you should use distance. Cause, if there is a faster way to acquire the position it will be rolled into that function.

As far as other improvements, I'd suggest using C++14's begin and end and not calling a function which only serves to wrap a lower_bound (and fail to properly return a value.) So the way I'd write this code would look like:

auto pos = distance(lower_bound(begin(values), end(values), tests[i]), begin(values));
like image 20
Jonathan Mee Avatar answered Oct 01 '22 02:10

Jonathan Mee


Q1: I would like to know if the code matches with the binary search implementation.??

Yes, it (almost) is. Check std::lower_bound, which states:

Complexity:

On average, logarithmic in the distance between first and last: Performs approximately log2(N)+1 element comparisons (where N is this distance). On non-random-access iterators, the iterator advances produce themselves an additional linear complexity in N on average.


Q2: Is there a faster way to get the index of vector elements.??

That's a rather broad question.


Q3: Any further suggestions to improve this code.

Hello world, Code Review!


PS - Did you even compile the code? It gives several messages, like:

warning: no return statement in function returning non-void [-Wreturn-type]

Compile with warnings enabled, like this:

g++ -Wall main.cpp
like image 23
gsamaras Avatar answered Oct 01 '22 02:10

gsamaras