I am reading about <code>Tries</code> commonly known as Prefix trees and <code>Suffix Trees</code>. Although I have found code for a <code>Trie</code> I can not find an example for a <code>Suffix Tree</code>. Also I get the feeling that the code that builds a <code>Trie</code> is the same as the one for a <code>Suffix Tree</code> with the only difference that in the former case we store prefixes but in the latter suffixes. Is this true? Can anyone help me clear this out in my head? An example code would be great help!

If you imagine a Trie in which you put some word's suffixes, you would be able to query it for the string's substrings very easily. This is the main idea behind suffix tree, it's basically a "suffix trie". But using this naive approach, constructing this tree for a string of size n would be O(n^2) and take a lot of memory. Since all the entries of this tree are suffixes of the same string, they share a lot of information, so there are optimized algorithms that allows you to create them more efficiently. Ukkonen's algorithm, for example, allows you to create a suffix tree online in O(n) time complexity.

Suffix tree and Tries. What is the difference?

Tags:

algorithm

data-structures

suffix-tree

trie

I am reading about Tries commonly known as Prefix trees and Suffix Trees.
Although I have found code for a Trie I can not find an example for a Suffix Tree. Also I get the feeling that the code that builds a Trie is the same as the one for a Suffix Tree with the only difference that in the former case we store prefixes but in the latter suffixes.
Is this true? Can anyone help me clear this out in my head? An example code would be great help!

338

asked Dec 15 '12 16:12

Cratylus

2 Answers

A suffix tree can be viewed as a data structure built on top of a trie where, instead of just adding the string itself into the trie, you would also add every possible suffix of that string. As an example, if you wanted to index the string banana in a suffix tree, you would build a trie with the following strings:

banana anana nana ana na a

Once that's done you can search for any n-gram and see if it is present in your indexed string. In other words, the n-gram search is a prefix search of all possible suffixes of your string.

This is the simplest and slowest way to build a suffix tree. It turns out that there are many fancier variants on this data structure that improve on either or both space and build time. I'm not well versed enough in this domain to give an overview but you can start by looking into suffix arrays or this class advanced data structures (lecture 16 and 18).

This answer also does a wonderfull job explaining a variant of this data-structure.

176

answered Sep 21 '22 23:09

Ze Blob

But using this naive approach, constructing this tree for a string of size n would be O(n^2) and take a lot of memory.

Since all the entries of this tree are suffixes of the same string, they share a lot of information, so there are optimized algorithms that allows you to create them more efficiently. Ukkonen's algorithm, for example, allows you to create a suffix tree online in O(n) time complexity.

answered Sep 20 '22 23:09

Juan Lopes

Related questions
                            
                                How is CPU usage calculated?
                            
                                Sort on a string that may contain a number
                            
                                How to rank a million images with a crowdsourced sort
                            
                                Take n random elements from a List<E>?
                            
                                How to make a for loop variable const with the exception of the increment statement?
                            
                                Differences between OT and CRDT
                            
                                What is the minimum cost to connect all the islands?
                            
                                How to understand the knapsack problem is NP-complete?
                            
                                Comparing object graph representation to adjacency list and matrix representations
                            
                                Support Resistance Algorithm - Technical analysis
                            
                                Rounding to an arbitrary number of significant digits
                            
                                Count number of 1's in binary representation
                            
                                Interview Question: Merge two sorted singly linked lists without creating new nodes
                            
                                Why does the greedy coin change algorithm not work for some coin sets?
                            
                                Is it faster to sort a list after inserting items or adding them to a sorted list
                            
                                Unsupervised clustering with unknown number of clusters
                            
                                What is the best image downscaling algorithm (quality-wise)?
                            
                                What is the fastest way to transpose a matrix in C++?
                            
                                Choice of programming language for learning data structures and algorithms [closed]
                            
                                Viola-Jones' face detection claims 180k features

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With