Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How do Suffix Trees work?

I'm going through the data structures chapter in The Algorithm Design Manual and came across Suffix Trees.

The example states:

Input:

XYZXYZ$
 YZXYZ$
  ZXYZ$
   XYZ$
    YZ$
     Z$
      $

Output:

enter image description here

I'm not able to understand how that tree gets generated from the given input string. Suffix trees are used to find a given Substring in a given String, but how does the given tree help towards that? I do understand another given example of a trie shown below, but if the below trie gets compacted to a suffix tree, then what would it look like?

enter image description here

like image 352
Anthony Avatar asked Jun 13 '12 16:06

Anthony


1 Answers

The standard efficient algorithms for constructing a suffix tree are definitely nontrivial. The main algorithm for doing so is called Ukkonen's algorithm and is a modification of the naive algorithm with two extra optmizations. You are probably best off reading this earlier question for details on how to build it.

You can construct suffix trees by using the standard insertion algorithms on radix tries to insert each suffix into the tree, but doing so wlil take time O(n2), which can be expensive for large strings.

As for doing fast substring searching, remember that a suffix tree is a compressed trie of all the suffixes of the original string (plus some special end-of-string marker). If a string S is a substring of the initial string T and you had a trie of all the suffixes of T, then you could just do a search to see if T is a prefix of any of the strings in that trie. If so, then T must be a substring of S, since all its characters exist in sequence somewhere in T. The suffix tree substring search algorithm is precisely this search applied to the compressed trie, where you follow the appropriate edges at each step.

Hope this helps!

like image 108
templatetypedef Avatar answered Sep 20 '22 12:09

templatetypedef