Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How is worst case time complexity of constructing suffix tree linear?

I have trouble understanding how the worst case time complexity of constructing a suffix tree is linear - particularly when we need to build a suffix tree for a string that may be composed of repeating single character such as "aaaaa".

Even if I were to construct a compressed suffix tree for "aaaaa", I won't be really able to compress any nodes since no two edges starting out of a node can have string-labels beginning with the same character.

This would result in a suffix tree of height 5, and at each insertion of the suffix, I would need to keep traversing from the root to the leaf.

Here was how I approached: suffixes: a, aa, aaa, aaaa, aaaaa

Create root node, create an edge bearing 'a' and connect this to a new node, where to its left bears "$", and repeat this process until we can aaaaa.

This would result in O(n^2) instead of O(n). What am I missing here?

like image 283
sia831 Avatar asked Oct 19 '22 06:10

sia831


1 Answers

I agree with the comments, but here are some more details:

The procedure you describe for forming the aaaaa tree is O(n), not O(n^2). Node and leaf creation are constant-time operations, and you perform them n==5 times. Your assumption of O(n^2) seems to be based on the idea that you'd traverse from root to leaf in each step, but there is no need to do this; e.g. in Ukkonen's algorithm:

  • You keep a pointer to the node you left off with before inserting the next
  • And in case of repetitions you don't perform any work until the repetitions end, and then you make insertions of the final $ mark one by one, following down the characters on the edge you have created, as well as the chain of suffix links in case of more complex repetitions

The key to why the Ukkonen algorithm (details here) is O(n) is that it maintains a "memory" of where to make inserts, in the form of (a) a pointer to where the previous insert was made, and (b) a network of suffix links. That network can be vast, but there is only one suffix link per internal node, so it's still O(n) in size.

like image 170
jogojapan Avatar answered Oct 21 '22 23:10

jogojapan