Given a string s
of length n
, is it possible to count the number of distinct substrings in s
in O(n)?
Example
Input: abb
Output: 5
('abb', 'ab', 'bb', 'a', 'b'
)
I have done some research but i can't seem to find an algorithm that solves this problem in such an efficient way. I know a O(n^2) approach is possible, but is there a more efficient algorithm?
I don't need to obtain each of the substrings, just the total number of distinct ones (in case it makes a difference).
Number of substrings of a string of length n is (n * (n + 1) / 2).
Approach: The count of sub-strings of length n will always be len – n + 1 where len is the length of the given string.
A substring is completely defined by two parameters, such as start and length, in the range [1:n], so there are no more than n^2 possibilities. A general, non-contiguous, subset needs n true-false decisions to specify it, so there are 2^n possibilities.
You can use Ukkonen's algorithm to build a suffix tree in linear time:
https://en.wikipedia.org/wiki/Ukkonen%27s_algorithm
The number of substrings of s is then the number of prefixes of strings in the trie, which you can calculate simply in linear time. It's just total number of characters in all nodes.
For instance, your example produces a suffix tree like:
/\
b a
| b
b b
5 characters in the tree, so 5 substrings. Each unique string is a path from the root ending after a different letter: abb, ab, a, bb, b. So the number of strings is the number of letters in the tree.
More precisely:
NOTE for people who are wondering how it could be possible to build a tree that contains O(N^2) characters in O(N) time:
There's a trick to the representation of a suffix tree. Instead of storing the actual strings in the nodes of the tree, you just store pointers into the orignal string, so the node that contains "abb" doesn't have "abb", it has (0,3) -- 2 integers per node, regardless of how long the string in each node is, and the suffix tree has O(N) nodes.
Construct the LCP array and subtract its sum from the number of substrings (n(n+1)/2).
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With