How Do I Choose Between a Hash Table and a Trie (Prefix Tree)?

People also ask

When would you use a trie?

A Trie (usually pronounced “try”) is a tree data structure optimized for a specific type of searching. You use a Trie when you want to take a partial value and return a set of possible complete values. The classic example for this is an Autocomplete.

Which is an advantage of using trie over hashing for word search problem?

Trie allows us to input and finds strings in O(l) time, where l is the length of a single word. It is faster as compared to both hash tables and binary search trees. It provides alphabetical filtering of entries by the key of the node and hence makes it easier to print all words in alphabetical order.

What is the difference between a tree and a trie?

A tree is a general structure of recursive nodes. There are many types of trees. Popular ones are binary tree and balanced tree. A Trie is a kind of tree, known by many names including prefix tree, digital search tree, and retrieval tree (hence the name 'trie').

How trie data structure makes patterns match faster?

The trie data structure provides fast pattern matching for string data values. Using trie, we bring the search complexity of a string to the optimal limit. A trie searches a string in O(m) time complexity, where m is the length of the string. In trie, every node except the root stores a character value.

Advantages of tries:

The basics:

Predictable O(k) lookup time where k is the size of the key
Lookup can take less than k time if it's not there
Supports ordered traversal
No need for a hash function
Deletion is straightforward

New operations:

You can quickly look up prefixes of keys, enumerate all entries with a given prefix, etc.

Advantages of linked structure:

If there are many common prefixes, the space they require is shared.
Immutable tries can share structure. Instead of updating a trie in place, you can build a new one that's different only along one branch, elsewhere pointing into the old trie. This can be useful for concurrency, multiple simultaneous versions of a table, etc.
An immutable trie is compressible. That is, it can share structure on the suffixes as well, by hash-consing.

Advantages of hashtables:

Everyone knows hashtables, right? Your system will already have a nice well-optimized implementation, faster than tries for most purposes.
Your keys need not have any special structure.
More space-efficient than the obvious linked trie structure (see comments below)

It all depends on what problem you're trying to solve. If all you need to do is insertions and lookups, go with a hash table. If you need to solve more complex problems such as prefix-related queries, then a trie might be the better solution.

Everyone knows hash table and its uses but it is not exactly constant look up time , it depends on how big the hash table is , the computational complexity of the hash function.

Creating huge hash tables for efficient lookup is not an elegant solution in most of the industrial scenarios where even small latency/scalability matters (e.g.: high frequency trading). You have to care about the data structures to be optimized for space it takes up in memory too to reduce cache miss.

A very good example where trie better suits the requirements is messaging middleware . You have a million subscribers and publishers of messages to various categories (in JMS terms - Topics or exchanges) , in such cases if you want to filter out messages based on topics (which are actually strings) , you definitely do not want create hash table for the million subscriptions with million topics . A better approach is store the topics in trie , so when filtering is done based on topic match , its complexity is independent of number of topics/subscriptions/publishers (only depends on the length of string). I like it because you can be creative with this data structure to optimize space requirements and hence have lower cache miss.

Use a tree:

If you need auto complete feature
Find all words beginning with 'a' or 'axe' so on.
A suffix tree is a special form of a tree. Suffix trees have a whole list of advantages that hash cannot cover.

There's something I haven't seen anyone mention explicitly that I think is important to keep in mind. Both hash tables and tries of various kinds will typically have O(k) operations, where k is the length of the string in bits (or equivalently in chars).

This is assuming you have a good hash function. If you don't want "farm" and "farm animals" to hash to the same value, then the hash function will have to use all the bits of the key, and so hashing "farm animals" should take about twice as long as "farm" (unless you're in some sort of rolling hash scenario, but there are somewhat similar operation-saving scenarios with tries too). And with a vanilla trie, it's clear why inserting "farm animals" will take about twice as long as just "farm". In the long run it's true with compressed tries as well.

Insertion and lookup on a trie is linear with the lengh of the input string O(s).

A hash will give you a O(1) for lookup ans insertion, but first you have to calculate the hash based on the input string which again is O(s).

Conclussion, the asymptotic time complexity is linear in both cases.

The trie has some more overhead from data perspective, but you can choose a compressed trie which will put you again, more or less on a tie with the hash table.

To break the tie ask yourself this question: Do i need to lookup for full words only? Or do I need to return all words matching a prefix? (As in a predictive text input system ). For the first case, go for a hash. It is simpler and cleaner code. Easier to test and maintain. For a more ellaborated use case where prefixes or sufixes matter, go for a trie.

And if you do it just for fun, implementing a trie would put a Sunday afternoon to a good use.

Related questions
                            
                                How to implement a queue with three stacks?
                            
                                What are efficient data structures and algorithms for simulating loaded dice?
                            
                                Why are λ-calculus optimal evaluators able to compute big modular exponentiations without formulas?
                            
                                How does one make a Zip bomb?
                            
                                Red black tree over avl tree
                            
                                What is the fastest/most efficient way to find the highest set bit (msb) in an integer in C?
                            
                                Which parallel sorting algorithm has the best average case performance?
                            
                                Why does Java's Arrays.sort method use two different sorting algorithms for different types?
                            
                                What is a good Hash Function?
                            
                                How to hash a string into 8 digits?
                            
                                Why use Dijkstra's Algorithm if Breadth First Search (BFS) can do the same thing faster?
                            
                                How does the Amazon Recommendation feature work?
                            
                                Algorithm to detect intersection of two rectangles?
                            
                                hash function for string
                            
                                Cartesian product of multiple arrays in JavaScript
                            
                                Modular multiplicative inverse function in Python
                            
                                Throwing cats out of windows
                            
                                Kotlin - Idiomatic way to remove duplicate strings from array?
                            
                                Implement Stack using Two Queues
                            
                                How to check whether two lists are circularly identical in Python

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

How Do I Choose Between a Hash Table and a Trie (Prefix Tree)?

Tags:

algorithm

hashtable

data-structures

trie

People also ask

Recent Activity

Donate For Us