I'm doing some work with Ukkonen's algorithm for building suffix trees, but I'm not understanding some parts of the author's explanation for it's linear-time complexity.
I have learned the algorithm and have coded it, but the paper which I'm using as the main source of information (linked bellow) is kinda confusing at some parts so it's not really clear for me why the algorithm is linear.
Any help? Thanks.
Link to Ukkonen's paper: http://www.cs.helsinki.fi/u/ukkonen/SuffixT1withFigs.pdf
The algorithm begins with an implicit suffix tree containing the first character of the string. Then it steps through the string, adding successive characters until the tree is complete. This order addition of characters gives Ukkonen's algorithm its "on-line" property.
In computer science, a suffix tree (also called PAT tree or, in an earlier form, position tree) is a compressed trie containing all the suffixes of the given text as their keys and positions in the text as their values. Suffix trees allow particularly fast implementations of many important string operations.
We build a suffix tree by following each suffix and creating an edge for each character, starting with a top node. If the new suffix to be put in the tree begins with a set of characters that are already in the tree, we follow those characters until we have a different one, creating a new branch.
Find a copy of Gusfield's string algorithms textbook. It's got the best exposition of the suffix tree construction I've seen. The linearity is a surprising consequence of a number of optimizations of the high-level algorithm.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With