Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Understanding Ukkonen's algorithm for suffix trees [duplicate]

I'm doing some work with Ukkonen's algorithm for building suffix trees, but I'm not understanding some parts of the author's explanation for it's linear-time complexity.

I have learned the algorithm and have coded it, but the paper which I'm using as the main source of information (linked bellow) is kinda confusing at some parts so it's not really clear for me why the algorithm is linear.

Any help? Thanks.

Link to Ukkonen's paper: http://www.cs.helsinki.fi/u/ukkonen/SuffixT1withFigs.pdf

like image 313
Vinicius Pinto Avatar asked Aug 20 '09 07:08

Vinicius Pinto


People also ask

How does Ukkonen's algorithm work?

The algorithm begins with an implicit suffix tree containing the first character of the string. Then it steps through the string, adding successive characters until the tree is complete. This order addition of characters gives Ukkonen's algorithm its "on-line" property.

What is suffix tree in algorithm?

In computer science, a suffix tree (also called PAT tree or, in an earlier form, position tree) is a compressed trie containing all the suffixes of the given text as their keys and positions in the text as their values. Suffix trees allow particularly fast implementations of many important string operations.

How do you make a suffix tree?

We build a suffix tree by following each suffix and creating an edge for each character, starting with a top node. If the new suffix to be put in the tree begins with a set of characters that are already in the tree, we follow those characters until we have a different one, creating a new branch.


1 Answers

Find a copy of Gusfield's string algorithms textbook. It's got the best exposition of the suffix tree construction I've seen. The linearity is a surprising consequence of a number of optimizations of the high-level algorithm.

like image 79
Jouni K. Seppänen Avatar answered Oct 12 '22 02:10

Jouni K. Seppänen