Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Which node data structure to use for a trie

I am using trie for the first time.I wanted to know which is the best data structure to use for a trie while deciding which is the next branch that one is supposed to traverse. I was looking among an array,a hashmap and a linked list.

like image 880
dilip devaraj Avatar asked Sep 16 '11 20:09

dilip devaraj


People also ask

What is a trie node?

A trie is a tree-like data structure whose nodes store the letters of an alphabet. By structuring the nodes in a particular way, words and strings can be retrieved from the structure by traversing down a branch path of the tree. Tries in the context of computer science are a relatively new thing.

How do I add a node in trie?

If the input key is new or an extension of the existing key, we need to construct non-existing nodes of the key, and mark the end of the word for the last node. If the input key is a prefix of the existing key in Trie, we simply mark the last node of the key as the end of a word. The key length determines Trie depth.


1 Answers

Each of these options has their advantages and disadvantages.

If you store the child nodes in an array, then you can look up which child to visit extremely efficiently by just indexing into the array. However, the space usage per node will be high: O(|Σ|), where Σ is the set of letters that your words can be formed from, even if most of those children are null.

If you store the child nodes in a linked list, then the time required to find a child will be O(|Σ|), since you may need to scan across all of the nodes of the linked list to find the child you want. On the other hand, the space efficiency will be quite good, because you only store the children that you're using. You could also consider using a fixed-sized array here, which has even better space usage but leads to very expensive insertions and deletions.

If you store the child nodes in a hash table, then the (expected) time to find a child will be O(1) and the memory usage will only be proportional (roughly) to the number of children you have. Interestingly, because you know in advance what values you're going to be hashing, you could consider using a dynamic perfect hash table to ensure worst-case O(1) lookups, at the expense of some precomputation.

Another option would be to store the child nodes in a binary search tree. This gives rise to the ternary search tree data structure. This choice is somewhere between the linked list and hash table options - the space usage is low and you can perform predecessor and successor queries efficiently, but there's a slight increase in the cost of performing a lookup due to the search cost in the BST. If you have a static trie where insertions never occur, you can consider using weight-balanced trees as the BSTs at each point; this gives excellent runtime for searches (O(n + log k), where n is the length of the string to search for and k is the total number of words in the trie).

In short, the array lookups are fastest but its space usage in the worst case is the worst. A statically-sized array has the best memory usage but expensive insertions and deletions. The hash table has decently fast lookups and good memory usage (on average). Binary search trees are somewhere in the middle. I would suggest using the hash table here, though if you put a premium on space and don't care about lookup times the linked list might be better. Also, if your alphabet is small (say, you're making a binary trie), the array overhead won't be too bad and you may want to use that.

Hope this helps!

like image 157
templatetypedef Avatar answered Sep 18 '22 09:09

templatetypedef