Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Suffix tree VS Tries - in plain English , what's the difference?

I've taken a look at this questions , but I still don't see the difference between a Suffix tree and a Trie .

Both have all the substrings of a given string , so where do they differ from one another ?

like image 884
JAN Avatar asked Aug 01 '13 06:08

JAN


People also ask

What is the difference between suffix trie and suffix tree?

The trie is used to build the suffix tree and that is why most textbooks only provide code for tries. But this is the worst-case implementation eh? @Cratylus Suffix trees are most useful on very large strings (eg. indexing all the works of Shakespeare) where O(n^2) space and build time is simply not going to cut it.

What is trie and explain about suffix tree?

In computer science, a suffix tree (also called PAT tree or, in an earlier form, position tree) is a compressed trie containing all the suffixes of the given text as their keys and positions in the text as their values. Suffix trees allow particularly fast implementations of many important string operations.

What is suffix trie used for?

Trie. In a trie, each alphabet of all the strings in the prescribed string is parsed one by one and represented by a single node. If two or more words start with the same sub-string, the identical sub-string is represented by the same chain of nodes.


1 Answers

Suffix tree - a large text is given. Query - search many times any words in the text.
Example: You are implementing your own cool text redactor with solitaire and kittens=) You are going to implement CTRL+F function. Probable implementation - index document (create suffix tree) and when user looks for some word - search it in a tree.

Trie - a large text is given. Query - search many times predefined words in the text.
Example: You are implementing your own cool facebook with poker and Justin Bieber's fans=) You don't want your users to post swear words. Probable implementation - create trie of swear words. When users types some text search forbiiden words and replace them with *.

In general, suffix tree= trie. Suffix tree is a trie of all suffixes of some word. When you want search something in a dictionary use trie. When you search something in a solid text use suffix tree.

Important note - building/rebuilding suffix tree for large text is complex operation. Once you changed your text you have to recreate suffix tree. Rebuilding trie is a trivial operation - just add new word in O(wordLength)

Conclusion
Suffix tree. You know nothing about future queries.Spend time once for creating suffix tree and you are ready to process requests. Known info is a text. Situations where requests are unknown but text is given and not going to change oftenly are candidates for using suffix tree. For example you can't use trie (aho-corasick algo) in CTRL+F implementation - because you just can't give dictionary as the input for aho-corasick's algo based on the trie.

Trie. You know nothing about text where you will perform searching. But you know future queries. Spend time for preprocessing/preparing data structure for your queries and you can perform search queries in any text. For example in replacing forbidden words task you don't know what text user will post but you know forbidden words. Creating suffix tree for each short new post would be too stupid=) UPD As @mightyWOZ noticed in comment, pure trie is not applicable but we can use Aho-Corasick algo which is extension over trie. So, statement is still true for tries - there exist approach (Aho-Corasick) which uses trie as a base, pre-processes queries and then can handle any text.

like image 193
Baurzhan Avatar answered Nov 05 '22 20:11

Baurzhan