Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Generating suffix tree of string S[2..m] from suffix tree of string S[1..m]

Is there a fast (O(1) time complexity) way of generating a suffix tree of string S[2..m] from suffix tree of string S[1..m]?

I am familiar with Ukkonen's, so I know how to make fast suffix tree of string S[1..m+1] from suffix tree of string S[1..m], but I couldn't apply the algorithm for reverse situation.

like image 857
user2356167 Avatar asked May 06 '13 21:05

user2356167


1 Answers

Well, as @jogojapan says, to get the S[2..m] tree from the S[1..m] tree we need to:

  • Find the position-0 leaf L.
  • If L has more than one sibling, delete the pointer from L's parent to L
  • If L has exactly one sibling, change the pointer from L's grandparent to L's parent so it instead points to L's sibling.

@jogojapan further suggests that you keep a pointer to the deepest leaf in the tree. There are two problems with that: L isn't necessarily the deepest leaf in the tree, as Wikipedia's example shows, and second if you want to be able to output the same type of data structure as you received, once removing L you need to find the new position-0 leaf, which will take O(m) time anyway.

(What you could do is construct an array of pointers to each leaf in O(m) time and count-sort them by position in another O(m) time. Then you'd be able to construct all the trees { S[t..n] : 1 <= t <= m } in constant amortized time each)

Assuming you're not interested in amortized time though, let's prove what you ask is impossible.

  • We know any algorithm to modify the suffix tree of S[1..m] must start at the root: it can't start anywhere else because we know nothing about the underlying concrete data structure, and we don't know that the tree's nodes have parent pointers, so the only position the whole tree is accessible from is the root.
  • We also know that it must locate the position-0 leaf before it can hope to modify the data structure into the suffix tree for S[2..m]. To do this, it must obviously traverse every node between the root and the position-0 leaf.
  • Thing is, consider the suffix tree of a^m (the character a repeated m times): the length of the path is m-1. So any algorithm must visit at least m-1 nodes, and therefore take O(m) time in the worst case.
like image 143
Andy Jones Avatar answered Oct 05 '22 12:10

Andy Jones