Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How Do I Choose Between a Hash Table and a Trie (Prefix Tree)?

People also ask

When would you use a trie?

A Trie (usually pronounced “try”) is a tree data structure optimized for a specific type of searching. You use a Trie when you want to take a partial value and return a set of possible complete values. The classic example for this is an Autocomplete.

Which is an advantage of using trie over hashing for word search problem?

Trie allows us to input and finds strings in O(l) time, where l is the length of a single word. It is faster as compared to both hash tables and binary search trees. It provides alphabetical filtering of entries by the key of the node and hence makes it easier to print all words in alphabetical order.

What is the difference between a tree and a trie?

A tree is a general structure of recursive nodes. There are many types of trees. Popular ones are binary tree and balanced tree. A Trie is a kind of tree, known by many names including prefix tree, digital search tree, and retrieval tree (hence the name 'trie').

How trie data structure makes patterns match faster?

The trie data structure provides fast pattern matching for string data values. Using trie, we bring the search complexity of a string to the optimal limit. A trie searches a string in O(m) time complexity, where m is the length of the string. In trie, every node except the root stores a character value.


Advantages of tries:

The basics:

  • Predictable O(k) lookup time where k is the size of the key
  • Lookup can take less than k time if it's not there
  • Supports ordered traversal
  • No need for a hash function
  • Deletion is straightforward

New operations:

  • You can quickly look up prefixes of keys, enumerate all entries with a given prefix, etc.

Advantages of linked structure:

  • If there are many common prefixes, the space they require is shared.
  • Immutable tries can share structure. Instead of updating a trie in place, you can build a new one that's different only along one branch, elsewhere pointing into the old trie. This can be useful for concurrency, multiple simultaneous versions of a table, etc.
  • An immutable trie is compressible. That is, it can share structure on the suffixes as well, by hash-consing.

Advantages of hashtables:

  • Everyone knows hashtables, right? Your system will already have a nice well-optimized implementation, faster than tries for most purposes.
  • Your keys need not have any special structure.
  • More space-efficient than the obvious linked trie structure (see comments below)

It all depends on what problem you're trying to solve. If all you need to do is insertions and lookups, go with a hash table. If you need to solve more complex problems such as prefix-related queries, then a trie might be the better solution.


Everyone knows hash table and its uses but it is not exactly constant look up time , it depends on how big the hash table is , the computational complexity of the hash function.

Creating huge hash tables for efficient lookup is not an elegant solution in most of the industrial scenarios where even small latency/scalability matters (e.g.: high frequency trading). You have to care about the data structures to be optimized for space it takes up in memory too to reduce cache miss.

A very good example where trie better suits the requirements is messaging middleware . You have a million subscribers and publishers of messages to various categories (in JMS terms - Topics or exchanges) , in such cases if you want to filter out messages based on topics (which are actually strings) , you definitely do not want create hash table for the million subscriptions with million topics . A better approach is store the topics in trie , so when filtering is done based on topic match , its complexity is independent of number of topics/subscriptions/publishers (only depends on the length of string). I like it because you can be creative with this data structure to optimize space requirements and hence have lower cache miss.


Use a tree:

  1. If you need auto complete feature
  2. Find all words beginning with 'a' or 'axe' so on.
  3. A suffix tree is a special form of a tree. Suffix trees have a whole list of advantages that hash cannot cover.

There's something I haven't seen anyone mention explicitly that I think is important to keep in mind. Both hash tables and tries of various kinds will typically have O(k) operations, where k is the length of the string in bits (or equivalently in chars).

This is assuming you have a good hash function. If you don't want "farm" and "farm animals" to hash to the same value, then the hash function will have to use all the bits of the key, and so hashing "farm animals" should take about twice as long as "farm" (unless you're in some sort of rolling hash scenario, but there are somewhat similar operation-saving scenarios with tries too). And with a vanilla trie, it's clear why inserting "farm animals" will take about twice as long as just "farm". In the long run it's true with compressed tries as well.


Insertion and lookup on a trie is linear with the lengh of the input string O(s).

A hash will give you a O(1) for lookup ans insertion, but first you have to calculate the hash based on the input string which again is O(s).

Conclussion, the asymptotic time complexity is linear in both cases.

The trie has some more overhead from data perspective, but you can choose a compressed trie which will put you again, more or less on a tie with the hash table.

To break the tie ask yourself this question: Do i need to lookup for full words only? Or do I need to return all words matching a prefix? (As in a predictive text input system ). For the first case, go for a hash. It is simpler and cleaner code. Easier to test and maintain. For a more ellaborated use case where prefixes or sufixes matter, go for a trie.

And if you do it just for fun, implementing a trie would put a Sunday afternoon to a good use.