Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Trie saves space, but how?

I am confused as to how the Trie implementation saves space & stores data in most compact form!

If you look at the tree below. When you store a character at any node, you also need to store a reference to that & thus for each character of the string you need to store its reference. Ok we saved some space when a common character arrived but we lost more space in storing a reference to that character node.

So isn't there a lot of structural overhead to maintain this tree itself ? Instead if a TreeMap was used in place of this, lets say to implement a dictionary, this could have saved a lot more space as string would be kept in one piece hence no space wasted in storing references, isn't it ?

enter image description here

like image 624
Rajat Gupta Avatar asked Nov 25 '11 06:11

Rajat Gupta


People also ask

How is a trie stored in memory?

whatever we are storing in trie can be easily store in hashtable also @Design_Gurus. For each trie node it will be acting as key to table and values would be list of k strings acting as top k items. Since out memory footprint esimated is 25 gb we can have atmax 25*k which is still in GB, it would be better than trie.

What is the space complexity of trie?

Space complexity: O(m) (In the worst case newly inserted key doesn't share a prefix with the the keys already inserted in the trie. We have to add m new nodes, which takes us O(m) space.)

How do you optimize trie data structure?

One way to implementing Trie is linked set of nodes, where each node contains an array of child pointers, one for each symbol in the alphabet. This is not efficient in terms of time as we can't quickly find a particular child. The efficient way is an implementation where we use hash map to store children of a node.

How does trie data structure work?

A trie is a tree-like data structure whose nodes store the letters of an alphabet. By structuring the nodes in a particular way, words and strings can be retrieved from the structure by traversing down a branch path of the tree. Tries in the context of computer science are a relatively new thing.


1 Answers

To save space when using a trie, one can use a compressed trie (also known as a patricia trie or radix tree), for which one node can represent multiple characters:

In computer science, a radix tree (also patricia trie or radix trie) is a space-optimized trie data structure where each node with only one child is merged with its child. The result is that every internal node has at least two children. Unlike in regular tries, edges can be labeled with sequences of characters as well as single characters. This makes them much more efficient for small sets (especially if the strings are long) and for sets of strings that share long prefixes.

Example of a radix tree:

radix tree or patricia trie

Note that a trie is usually used as an efficient data structure for prefix matching on a set of strings. A trie can also be used as an associative array (like a hash table) where the key is a string.

like image 160
David Hu Avatar answered Sep 29 '22 07:09

David Hu