I would like to ask if there is an algorithm for counting the number of discrete occurencies of a substring in a string in O(n) time.
[EDIT 17/11/2013: Count the leaf nodes. Thanks Vinicius Pinto!]
You can build a suffix tree on the text in linear time. Then, search for your substring in the suffix tree; when you find it, count the number of leaf nodes beneath the matching node. This is O(m + k) for a substring of length m that appears k times (add an n term for building the suffix tree). Or, you can precalculate the number of descendants for each node in the tree using a depth-first traversal -- this will give O(m) queries.
For large texts, suffix arrays are often faster than suffix trees in practice, despite searching for a single length-m string dropping from O(m) to O(m log m). In this case, all occurrences of a particular substring will appear as a contiguous block in the suffix array, so the width of this block is the number of occurrences.
You can use the KMP Algorithm and modify it to increment a counter instead of returning.
Another possibility is the Rabin-Karp algorithm, however this relies on hashing, so you either have to accept the possibility of false positives while keeping the complexity linear, or accept the possibility of quadratic complexity while keeping the results 100% correct.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With