If the title sounds weird here is another explanation:
If I have a range a, and I want to count how many times another range b appeared in range a, is there a std::
function to do it?
If no, is there a simple way to do it (ofc I can manually loop using std::search
- I'm talking about something more elegant)?
I think you'll need to build your own. Here's what comes to my mind as an implementation.
template <typename Iterator1, typename Iterator2>
size_t subsequence_count(Iterator1 haystack_begin, Iterator1 haystack_end, Iterator2 needle_begin, Iterator2 needle_end) {
size_t count = 0;
while (true) {
haystack_begin = std::search(haystack_begin, haystack_end, needle_begin, needle_end);
if (haystack_begin == haystack_end)
return count;
count++;
haystack_begin++;
}
}
template <typename Iterator1, typename Iterator2, typename BinaryPredicate>
size_t subsequence_count(Iterator1 haystack_begin, Iterator1 haystack_end, Iterator2 needle_begin, Iterator2 needle_end, BinaryPredicate predicate) {
size_t count = 0;
while (true) {
haystack_begin = std::search(haystack_begin, haystack_end, needle_begin, needle_end, predicate);
if (haystack_begin == haystack_end)
return count;
count++;
haystack_begin++;
}
}
Some code using this:
int main() {
std::vector<int> haystack = {1, 19, 7, 23, 2, 19, 19, 19, 19};
std::vector<int> needle = {19, 19};
assert(subsequence_count(begin(haystack), end(haystack), begin(needle), end(needle) == 3);
}
You could use std::count_if on range A with a lambda that uses std::find on range B.
EDIT: Replaced std::search with std::find.
If you want to do this more efficiently than O(nm)
, for m
characters in the pattern, n
in the string to be searched, you can consider a suffix tree. Essentially what this means is that you build a specialized tree structure that simultaneously describes all possible suffixes of a string. So if your string is "ratatat", your suffix string will simultaneously represent "ratatat", "atatat", "tatat", "atat", "tat", "at", and "t". So you can find (and count) all occurrences of a particular string very quickly once you've built the tree. Of course building it takes some programming effort, and some memory!
There is a very nice description here (this refers to Skiena's book The Algorithm Design Manual, which is where I read about them).
PS I started off searching for suffix tree C++ implementations. There are several useful stackoverflow questions on this, but nothing in std so far as I can tell.
Edit to add alternative algorithm
On second thoughts, I think that Boyer-Moore string matching is probably a better solution, not least because there is a boost
implementation readily available -- and all you've said you want to do is find a particular search string (suffix trees are good if you want to count the occurrences of different search strings). Essentially what the b-m algorithm does is take advantage of structure in the search string to jump ahead when there is a mismatch, using a precomputed table for the search string (c.f. suffix trees which pre-process the string to be searched). So you ought to be able to manually loop with the boyer-moore boost search (rather than with the std search) and get a significant efficiency gain.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With