Counting the number of substrings

Question

I would like to ask if there is an algorithm for counting the number of discrete occurencies of a substring in a string in O(n) time.

j_random_hacker · Accepted Answer

[EDIT 17/11/2013: Count the leaf nodes. Thanks Vinicius Pinto!]

You can build a suffix tree on the text in linear time. Then, search for your substring in the suffix tree; when you find it, count the number of leaf nodes beneath the matching node. This is O(m + k) for a substring of length m that appears k times (add an n term for building the suffix tree). Or, you can precalculate the number of descendants for each node in the tree using a depth-first traversal -- this will give O(m) queries.

For large texts, suffix arrays are often faster than suffix trees in practice, despite searching for a single length-m string dropping from O(m) to O(m log m). In this case, all occurrences of a particular substring will appear as a contiguous block in the suffix array, so the width of this block is the number of occurrences.

IVlad · Answer

You can use the KMP Algorithm and modify it to increment a counter instead of returning.

Another possibility is the Rabin-Karp algorithm, however this relies on hashing, so you either have to accept the possibility of false positives while keeping the complexity linear, or accept the possibility of quadratic complexity while keeping the results 100% correct.

Counting the number of substrings

Tags:

algorithm

pattern-matching

GioTse

2 Answers

j_random_hacker

IVlad

Recent Activity

Donate For Us

Counting the number of substrings

Tags:

algorithm

pattern-matching

GioTse

2 Answers

j_random_hacker

IVlad

Related questions

Recent Activity

Donate For Us