I've been studying tries and checking out their advantages and disadvantages. They're quite useful in many practical applications like dictionary, spell checkers etc due to their constant O(m) look-ups (where m is length of the string) and other advantages like providing ordered retrieval of strings, and getting common prefixes. So, the advantages are pretty clear to me, but the limitations are a bit confusing.
I'm following this link : https://en.wikipedia.org/wiki/Trie
Drawbacks listed here are:
Follow up question - Why is there a scenario involving secondary storage? Aren't tries also supposed to be stored in main memory. If they're stored in secondary storage, then there's no use of using trie anyways as disk access will always cause greater times.
Follow-up question : Is it due to the fact that tries would contain more references/pointers for connecting each character to next one, and that'd consume more bytes than if it was stored as a whole string? (I got this reason from one of the answers here). Can anyone elaborate this too?
I'd really appreciate some help here. Thanks.
Unlike a binary search tree, nodes in the trie do not store their associated key. Instead, a node's position in the trie defines the key with which it is associated. This distributes the value of each key across the data structure, and means that not every node necessarily has an associated value.
Tries: Tries are an extremely special and useful data-structure that are based on the prefix of a string. They are used to represent the “Retrieval” of data and thus the name Trie. A Trie is a special data structure used to store strings that can be visualized like a graph.
The complexity of creating a trie is O(W*L) , where W is the number of words, and L is an average length of the word: you need to perform L lookups on the average for each of the W words in the set.
9. Which of the following is true about the trie? Explanation: A trie is an ordered tree where (i) the root represents an empty string(“”) (ii) each node other than root is labeled with a character (iii) the children of a nodes are lexicographically ordered (iv) the paths from the leaves to the root yields the strings.
First, "constant O(m) look-ups" is meaningless. Lookup time in a trie is O(m): it depends on the length of the string you're looking up.
A well constructed hash table (i.e. a good hash function and a reasonable load factor) has O(1) lookup time.
Assuming competent construction, looking up a string in a hash table will be much faster than looking it up in a trie.
Tries and hash tables are used for different things. If all you want is the ability to lookup a word, then a hash table will be faster. If you want to find common prefixes, ordered retrieval, or do similar things, then you want a trie.
A hash table can look up individual strings very quickly. It's like a thoroughbred racehorse. That's all it can do. A trie, on the other hand, is a workhorse that can do a lot of things. It'll never be as fast at lookups as a hash table, but it can do lots of things that the hash table can't do.
For example, finding all the words that start with "pre" will take O(n) time with a dictionary because you have to search all of the words. With a trie, it takes three probes to find the subtree that contains all of those words, and then all you have to do is traverse that subtree. Sure, the worst case is O(n), but that's only if all the words in your trie start with "pre".
Whereas it's true that going to disk will be slower than if the entire trie were in memory, it's wrong to say that a disk-based trie offers no advantage over alternatives. If the data won't fit in memory, then no matter what data structure you use, you'll need some external (i.e. non-memory) storage. The fact that your data access is slower when it's on the disk does not fundamentally change the advantages or disadvantages of trie vs. hash table. For example, a disk-based trie will still be faster than a disk-based hash table when it comes to finding all the words with a particular prefix.
A hash table's overhead is typically a constant multiple of the number of words it contains. That is, in addition to the memory required to store the strings, there is per-string overhead to store the mapping between hash code and string.
Memory for a trie is a little more involved. In the worst case, there is one node per character. All those little node allocations start adding up. Imagine a dictionary that contains 200,000 words, and the average word length is five characters. That's a million nodes of overhead.
Fortunately, there are ways to greatly compress a trie, without losing much, if any, performance. The resulting data structure is much smaller and more cache-friendly than a naively constructed trie.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With