Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Is there a function that is to std::search what std::count is to std::find?

Tags:

c++

stl

If the title sounds weird here is another explanation:

If I have a range a, and I want to count how many times another range b appeared in range a, is there a std:: function to do it?

If no, is there a simple way to do it (ofc I can manually loop using std::search - I'm talking about something more elegant)?

like image 326
NoSenseEtAl Avatar asked May 09 '13 15:05

NoSenseEtAl


3 Answers

I think you'll need to build your own. Here's what comes to my mind as an implementation.

template <typename Iterator1, typename Iterator2>
size_t subsequence_count(Iterator1 haystack_begin, Iterator1 haystack_end, Iterator2 needle_begin, Iterator2 needle_end) {
    size_t count = 0;
    while (true) {
        haystack_begin = std::search(haystack_begin, haystack_end, needle_begin, needle_end);
        if (haystack_begin == haystack_end)
            return count;
        count++;
        haystack_begin++;
    }
}

template <typename Iterator1, typename Iterator2, typename BinaryPredicate>
size_t subsequence_count(Iterator1 haystack_begin, Iterator1 haystack_end, Iterator2 needle_begin, Iterator2 needle_end, BinaryPredicate predicate) {
    size_t count = 0;
    while (true) {
        haystack_begin = std::search(haystack_begin, haystack_end, needle_begin, needle_end, predicate);
        if (haystack_begin == haystack_end)
            return count;
        count++;
        haystack_begin++;
    }
}

Some code using this:

int main() {
    std::vector<int> haystack = {1, 19, 7, 23, 2, 19, 19, 19, 19};
    std::vector<int> needle   = {19, 19};

    assert(subsequence_count(begin(haystack), end(haystack), begin(needle), end(needle) == 3);
}
like image 185
Bill Lynch Avatar answered Nov 01 '22 10:11

Bill Lynch


You could use std::count_if on range A with a lambda that uses std::find on range B.

EDIT: Replaced std::search with std::find.

like image 1
TractorPulledPork Avatar answered Nov 01 '22 08:11

TractorPulledPork


If you want to do this more efficiently than O(nm), for m characters in the pattern, n in the string to be searched, you can consider a suffix tree. Essentially what this means is that you build a specialized tree structure that simultaneously describes all possible suffixes of a string. So if your string is "ratatat", your suffix string will simultaneously represent "ratatat", "atatat", "tatat", "atat", "tat", "at", and "t". So you can find (and count) all occurrences of a particular string very quickly once you've built the tree. Of course building it takes some programming effort, and some memory!

There is a very nice description here (this refers to Skiena's book The Algorithm Design Manual, which is where I read about them).

PS I started off searching for suffix tree C++ implementations. There are several useful stackoverflow questions on this, but nothing in std so far as I can tell.

Edit to add alternative algorithm

On second thoughts, I think that Boyer-Moore string matching is probably a better solution, not least because there is a boost implementation readily available -- and all you've said you want to do is find a particular search string (suffix trees are good if you want to count the occurrences of different search strings). Essentially what the b-m algorithm does is take advantage of structure in the search string to jump ahead when there is a mismatch, using a precomputed table for the search string (c.f. suffix trees which pre-process the string to be searched). So you ought to be able to manually loop with the boyer-moore boost search (rather than with the std search) and get a significant efficiency gain.

like image 1
TooTone Avatar answered Nov 01 '22 09:11

TooTone