Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Counting the number of substrings

I would like to ask if there is an algorithm for counting the number of discrete occurencies of a substring in a string in O(n) time.

like image 335
GioTse Avatar asked Jun 19 '11 11:06

GioTse


2 Answers

[EDIT 17/11/2013: Count the leaf nodes. Thanks Vinicius Pinto!]

You can build a suffix tree on the text in linear time. Then, search for your substring in the suffix tree; when you find it, count the number of leaf nodes beneath the matching node. This is O(m + k) for a substring of length m that appears k times (add an n term for building the suffix tree). Or, you can precalculate the number of descendants for each node in the tree using a depth-first traversal -- this will give O(m) queries.

For large texts, suffix arrays are often faster than suffix trees in practice, despite searching for a single length-m string dropping from O(m) to O(m log m). In this case, all occurrences of a particular substring will appear as a contiguous block in the suffix array, so the width of this block is the number of occurrences.

like image 68
j_random_hacker Avatar answered Sep 23 '22 08:09

j_random_hacker


You can use the KMP Algorithm and modify it to increment a counter instead of returning.

Another possibility is the Rabin-Karp algorithm, however this relies on hashing, so you either have to accept the possibility of false positives while keeping the complexity linear, or accept the possibility of quadratic complexity while keeping the results 100% correct.

like image 30
IVlad Avatar answered Sep 20 '22 08:09

IVlad