Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

When do we actually use a Trie?

I am starting to read about Trie. I got also references from friends here in: Tutorials on Trie

I am not clear on the following:
It seems that to go on and use a Trie one assumes that all the input strings that will be the search space and used to build the Trie are separated in distinct word boundaries.
E.g. all the example tutorials I have seen use input such as:

S={ball, bid, byte, car, cat, mac, map etc...}

Then we build the trie from S and do our searches (really fast)
My question is: How did we end up with S to begin with?
I mean before starting to read about tries I imagined that S would be an arbitrarily long text e.g. A Shakespeare passage.

Then using a Trie we could find things really fast.
But it seems this is not the case.

Is the assumption here that the input passage (of Shakespeare for example) is pre-processed first extracting all the words to get S?

So if one wants to search for patterns (same way as you do when you Google and see all pages having also spaces in your search query) a Trie is not appropriate?
When can we know if a Trie is the data structure that we can actually use?

like image 366
Jim Avatar asked May 22 '12 07:05

Jim


2 Answers

Tries are useful where you have a fixed dictionary you want to look up quickly. Compared to a hashtable it may require less storage for a large dictionary but may well take longer to look up. One example place I have used it is for mapping URLs to operations on a web server were there may be inheritance of functionality based on the prefix. Here recursing down a trie enables appropriate lookup of all of the methods that need to be called for a particular url. It would also be efficient for storing a dictionary.

For doing text searches you would typically represent documents using a token vector of leximes with weights (perhaps based on occurance frequency), and then search against that to get a ranking of documents against a particular search vector. There a number of standard libraries to do this which I would suggest using rather than writing your own - particularly for removing stopwords, dealing with synonyms and stemming.

like image 199
JARW Avatar answered Oct 16 '22 18:10

JARW


We can use tries for sub string searching in linear time, without pre processing the string every time. You can get a best tutorial on suffix tree generation @ Ukkonen's suffix tree algorithm in plain English?

like image 45
Abhishek Avatar answered Oct 16 '22 18:10

Abhishek