Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

trie or balanced binary search tree to store dictionary?

I have a simple requirement (perhaps hypothetical):

I want to store english word dictionary (n words) and given a word (character length m), the dictionary is able to tell, if the word exists in dictionary or not. What would be an appropriate data structure for this?

a balanced binary search tree? as done in C++ STL associative data structures like set,map

or

a trie on strings

Some complexity analysis: in a balanced bst, time would be (log n)*m (comparing 2 string takes O(m) time character by character)

in trie, if at each node, we could branch out in O(1) time, we can find using O(m), but the assumption that at each node, we can branch in O(1) time is not valid. at each node, max possible branches would be 26. if we want O(1) at a node, we will keep a short array indexible on characters at each node. This will blow-up the space. After a few levels in the trie, branching will reduce, so its better to keep a linked list of next node characters and pointers.

what looks more practical? any other trade-offs?

Thanks,

like image 242
xyz Avatar asked Jun 08 '11 13:06

xyz


People also ask

What is the difference between the binary tree and trie?

Unlike a binary search tree, nodes in the trie do not store their associated key. Instead, a node's position in the trie defines the key with which it is associated. This distributes the value of each key across the data structure, and means that not every node necessarily has an associated value.

What are the advantages of using binary search tree for a dictionary?

The advantage of using the BST is that all major operations (insert, search, and remove) are Θ(logn) in the average case. Of course, if the tree is badly balanced, then the cost can be as bad as Θ(n). Here is an implementation for the Dictionary interface, using a BST to store the records.

When should we use trie?

A Trie (usually pronounced “try”) is a tree data structure optimized for a specific type of searching. You use a Trie when you want to take a partial value and return a set of possible complete values. The classic example for this is an Autocomplete.


3 Answers

I'd say use a Trie, or better yet use its more space efficient cousin the Directed Acyclic Word Graph (DAWG).

It has the same runtime characteristics (insert, look up, delete) as a Trie but overlaps common suffixes as well as common prefixes which can be a big saving on space.

like image 125
luke Avatar answered Sep 29 '22 11:09

luke


If this is C++, you should also consider std::tr1::unordered_set. (If you have C++0x, you can use std::unordered_set.)

This just uses a hash table internally, which I would wager will out-perform any tree-like structure in practice. It is also trivial to implement because you have nothing to implement.

like image 23
Nemo Avatar answered Sep 29 '22 10:09

Nemo


Binary search is going to be easier to implement and it's only going to involve comparing tens of strings at the most. Given you know the data up front, you can build a balanced binary tree so performance is going to be predictable and easily understood.

With that in mind, I'd use a standard binary tree (probably using set from C++ since that's typically implemented as a tree).

like image 26
Jeff Foster Avatar answered Sep 29 '22 12:09

Jeff Foster