A Trie (usually pronounced “try”) is a tree data structure optimized for a specific type of searching. You use a Trie when you want to take a partial value and return a set of possible complete values. The classic example for this is an Autocomplete.
Trie allows us to input and finds strings in O(l) time, where l is the length of a single word. It is faster as compared to both hash tables and binary search trees. It provides alphabetical filtering of entries by the key of the node and hence makes it easier to print all words in alphabetical order.
A tree is a general structure of recursive nodes. There are many types of trees. Popular ones are binary tree and balanced tree. A Trie is a kind of tree, known by many names including prefix tree, digital search tree, and retrieval tree (hence the name 'trie').
The trie data structure provides fast pattern matching for string data values. Using trie, we bring the search complexity of a string to the optimal limit. A trie searches a string in O(m) time complexity, where m is the length of the string. In trie, every node except the root stores a character value.
Advantages of tries:
The basics:
New operations:
Advantages of linked structure:
Advantages of hashtables:
It all depends on what problem you're trying to solve. If all you need to do is insertions and lookups, go with a hash table. If you need to solve more complex problems such as prefix-related queries, then a trie might be the better solution.
Everyone knows hash table and its uses but it is not exactly constant look up time , it depends on how big the hash table is , the computational complexity of the hash function.
Creating huge hash tables for efficient lookup is not an elegant solution in most of the industrial scenarios where even small latency/scalability matters (e.g.: high frequency trading). You have to care about the data structures to be optimized for space it takes up in memory too to reduce cache miss.
A very good example where trie better suits the requirements is messaging middleware . You have a million subscribers and publishers of messages to various categories (in JMS terms - Topics or exchanges) , in such cases if you want to filter out messages based on topics (which are actually strings) , you definitely do not want create hash table for the million subscriptions with million topics . A better approach is store the topics in trie , so when filtering is done based on topic match , its complexity is independent of number of topics/subscriptions/publishers (only depends on the length of string). I like it because you can be creative with this data structure to optimize space requirements and hence have lower cache miss.
Use a tree:
There's something I haven't seen anyone mention explicitly that I think is important to keep in mind. Both hash tables and tries of various kinds will typically have O(k)
operations, where k
is the length of the string in bits (or equivalently in chars).
This is assuming you have a good hash function. If you don't want "farm" and "farm animals" to hash to the same value, then the hash function will have to use all the bits of the key, and so hashing "farm animals" should take about twice as long as "farm" (unless you're in some sort of rolling hash scenario, but there are somewhat similar operation-saving scenarios with tries too). And with a vanilla trie, it's clear why inserting "farm animals" will take about twice as long as just "farm". In the long run it's true with compressed tries as well.
Insertion and lookup on a trie is linear with the lengh of the input string O(s).
A hash will give you a O(1) for lookup ans insertion, but first you have to calculate the hash based on the input string which again is O(s).
Conclussion, the asymptotic time complexity is linear in both cases.
The trie has some more overhead from data perspective, but you can choose a compressed trie which will put you again, more or less on a tie with the hash table.
To break the tie ask yourself this question: Do i need to lookup for full words only? Or do I need to return all words matching a prefix? (As in a predictive text input system ). For the first case, go for a hash. It is simpler and cleaner code. Easier to test and maintain. For a more ellaborated use case where prefixes or sufixes matter, go for a trie.
And if you do it just for fun, implementing a trie would put a Sunday afternoon to a good use.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With