Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Disadvantages of tries

I've been studying tries and checking out their advantages and disadvantages. They're quite useful in many practical applications like dictionary, spell checkers etc due to their constant O(m) look-ups (where m is length of the string) and other advantages like providing ordered retrieval of strings, and getting common prefixes. So, the advantages are pretty clear to me, but the limitations are a bit confusing.

I'm following this link : https://en.wikipedia.org/wiki/Trie

Drawbacks listed here are:

  1. Tries can be slower in some cases than hash tables for looking up data, especially if the data is directly accessed on a hard disk drive or some other secondary storage device where the random-access time is high compared to main memory.

Follow up question - Why is there a scenario involving secondary storage? Aren't tries also supposed to be stored in main memory. If they're stored in secondary storage, then there's no use of using trie anyways as disk access will always cause greater times.

  1. Some tries can require more space than a hash table, as memory may be allocated for each character in the search string, rather than a single chunk of memory for the whole entry, as in most hash tables.

Follow-up question : Is it due to the fact that tries would contain more references/pointers for connecting each character to next one, and that'd consume more bytes than if it was stored as a whole string? (I got this reason from one of the answers here). Can anyone elaborate this too?

I'd really appreciate some help here. Thanks.

like image 712
gaurav jain Avatar asked Sep 29 '15 04:09

gaurav jain


People also ask

What is the difference between trees and tries?

Unlike a binary search tree, nodes in the trie do not store their associated key. Instead, a node's position in the trie defines the key with which it is associated. This distributes the value of each key across the data structure, and means that not every node necessarily has an associated value.

What is the purpose of tries?

Tries: Tries are an extremely special and useful data-structure that are based on the prefix of a string. They are used to represent the “Retrieval” of data and thus the name Trie. A Trie is a special data structure used to store strings that can be visualized like a graph.

What is the time complexity of trie?

The complexity of creating a trie is O(W*L) , where W is the number of words, and L is an average length of the word: you need to perform L lookups on the average for each of the W words in the set.

Which of the following about tries is correct?

9. Which of the following is true about the trie? Explanation: A trie is an ordered tree where (i) the root represents an empty string(“”) (ii) each node other than root is labeled with a character (iii) the children of a nodes are lexicographically ordered (iv) the paths from the leaves to the root yields the strings.


1 Answers

First, "constant O(m) look-ups" is meaningless. Lookup time in a trie is O(m): it depends on the length of the string you're looking up.

A well constructed hash table (i.e. a good hash function and a reasonable load factor) has O(1) lookup time.

Assuming competent construction, looking up a string in a hash table will be much faster than looking it up in a trie.

Tries and hash tables are used for different things. If all you want is the ability to lookup a word, then a hash table will be faster. If you want to find common prefixes, ordered retrieval, or do similar things, then you want a trie.

A hash table can look up individual strings very quickly. It's like a thoroughbred racehorse. That's all it can do. A trie, on the other hand, is a workhorse that can do a lot of things. It'll never be as fast at lookups as a hash table, but it can do lots of things that the hash table can't do.

For example, finding all the words that start with "pre" will take O(n) time with a dictionary because you have to search all of the words. With a trie, it takes three probes to find the subtree that contains all of those words, and then all you have to do is traverse that subtree. Sure, the worst case is O(n), but that's only if all the words in your trie start with "pre".

Whereas it's true that going to disk will be slower than if the entire trie were in memory, it's wrong to say that a disk-based trie offers no advantage over alternatives. If the data won't fit in memory, then no matter what data structure you use, you'll need some external (i.e. non-memory) storage. The fact that your data access is slower when it's on the disk does not fundamentally change the advantages or disadvantages of trie vs. hash table. For example, a disk-based trie will still be faster than a disk-based hash table when it comes to finding all the words with a particular prefix.

A hash table's overhead is typically a constant multiple of the number of words it contains. That is, in addition to the memory required to store the strings, there is per-string overhead to store the mapping between hash code and string.

Memory for a trie is a little more involved. In the worst case, there is one node per character. All those little node allocations start adding up. Imagine a dictionary that contains 200,000 words, and the average word length is five characters. That's a million nodes of overhead.

Fortunately, there are ways to greatly compress a trie, without losing much, if any, performance. The resulting data structure is much smaller and more cache-friendly than a naively constructed trie.

like image 200
Jim Mischel Avatar answered Oct 20 '22 17:10

Jim Mischel