I have trouble understanding how the worst case time complexity of constructing a suffix tree is linear - particularly when we need to build a suffix tree for a string that may be composed of repeating single character such as "aaaaa".
Even if I were to construct a compressed suffix tree for "aaaaa", I won't be really able to compress any nodes since no two edges starting out of a node can have string-labels beginning with the same character.
This would result in a suffix tree of height 5, and at each insertion of the suffix, I would need to keep traversing from the root to the leaf.
Here was how I approached: suffixes: a, aa, aaa, aaaa, aaaaa
Create root node, create an edge bearing 'a' and connect this to a new node, where to its left bears "$", and repeat this process until we can aaaaa.
This would result in O(n^2) instead of O(n). What am I missing here?
I agree with the comments, but here are some more details:
The procedure you describe for forming the aaaaa
tree is O(n), not O(n^2). Node and leaf creation are constant-time operations, and you perform them n==5 times. Your assumption of O(n^2) seems to be based on the idea that you'd traverse from root to leaf in each step, but there is no need to do this; e.g. in Ukkonen's algorithm:
$
mark one by one, following down the characters on the edge you have created, as well as the chain of suffix links in case of more complex repetitionsThe key to why the Ukkonen algorithm (details here) is O(n) is that it maintains a "memory" of where to make inserts, in the form of (a) a pointer to where the previous insert was made, and (b) a network of suffix links. That network can be vast, but there is only one suffix link per internal node, so it's still O(n) in size.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With