Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Fast search algorithm with std::vector<std::string>

Tags:

c++

search

vector

    for (std::vector<const std::string>::const_iterator it = serverList.begin(); it != serverList.end(); it++)
    {
        // found a match, store the location
        if (index == *it) // index is a string
        {
            indexResult.push_back(std::distance(serverList.begin(), it)); // std::vector<unsigned int>
        }
    }

I Have written the above code to look through a vector of strings and return another vector with the location of any "hits".

Is there a way to do the same, but faster? (If I have 10,000 items in the container, it will take a while). Please note that I have to check ALL of the items for matches and store its position in the container.

Bonus Kudos: Anyone know any way/links on how I can make the search so that it finds partial results (Example: search for "coolro" and store the location of variable "coolroomhere")

like image 971
maffo Avatar asked Nov 14 '11 13:11

maffo


People also ask

Is std::vector fast?

A std::vector can never be faster than an array, as it has (a pointer to the first element of) an array as one of its data members. But the difference in run-time speed is slim and absent in any non-trivial program. One reason for this myth to persist, are examples that compare raw arrays with mis-used std::vectors.

What is the difference between std :: string and std::vector?

std::string offers a very different and much expanded interface compared to std::vector<> . While the latter is just a boring old sequence of elements, the former is actually designed to represent a string and therefore offers an assortment of string-related convenience functions.

Is STD array faster than vector C++?

In both cases, the array version goes almost twice as fast as the vector version.

How do you find something in a vector C++?

You can use the find function, found in the std namespace, ie std::find . You pass the std::find function the begin and end iterator from the vector you want to search, along with the element you're looking for and compare the resulting iterator to the end of the vector to see if they match or not.


2 Answers

Basically, you're asking if it's possible to check all elements for a match, without checking all elements. If there is some sort of external meta-information (e.g. the data is sorted), it might be possible (e.g. using binary search). Otherwise, by its very nature, to check all elements, you have to check all elements.

If you're going to do many such searches on the list and the list doesn't vary, you might consider calculating a second table with a good hash code of the entries; again depending on the type of data being looked up, it could be more efficient to calculate the hash code of the index, and compare hash codes first, only comparing the strings if the hash codes were equal. Whether this is an improvement or not largely depends on the size of the table and the type of data in it. You might also, be able to leverage off knowledge about the data in the strings; if they are all URL's, for example, mostly starting with "http://www.", starting the comparison at the tenth character, and only coming back to compare the first 10 if all of the rest are equal, could end up with a big win.

With regards to finding substrings, you can use std::search for each element:

for ( auto iter = serverList.begin();
        iter != serverList.end();
        ++ iter ) {
    if ( std::search( iter->begin(), iter->end(),
                      index.begin(), index.end() ) != iter->end() ) {
        indexResult.push_back( iter - serverList.begin() );
    }
}

Depending on the number of elements being searched and the lengths of the strings involved, it might be more efficient to use something like BM search, however, precompiling the search string to the necessary tables before entering the loop.

like image 86
James Kanze Avatar answered Oct 09 '22 16:10

James Kanze


Use binary_search after sorting the vector

  1. std::sort( serverList.begin() , serverList.end() )
  2. std::lower_bound(serverList.begin() , serverList.end() , valuetoFind) to find first matching
  3. Use std::equal_range if you want to find all matching elements

The lower_bound & equal_range search because it is binary is logarithmic compared to your search that is O(N)

like image 43
parapura rajkumar Avatar answered Oct 09 '22 18:10

parapura rajkumar