trie or balanced binary search tree to store dictionary?

Tags:

I have a simple requirement (perhaps hypothetical):

I want to store english word dictionary (n words) and given a word (character length m), the dictionary is able to tell, if the word exists in dictionary or not. What would be an appropriate data structure for this?

a balanced binary search tree? as done in C++ STL associative data structures like set,map

a trie on strings

Some complexity analysis: in a balanced bst, time would be (log n)*m (comparing 2 string takes O(m) time character by character)

in trie, if at each node, we could branch out in O(1) time, we can find using O(m), but the assumption that at each node, we can branch in O(1) time is not valid. at each node, max possible branches would be 26. if we want O(1) at a node, we will keep a short array indexible on characters at each node. This will blow-up the space. After a few levels in the trie, branching will reduce, so its better to keep a linked list of next node characters and pointers.

what looks more practical? any other trade-offs?

Thanks,

242

asked Jun 08 '11 13:06

xyz

3 Answers

I'd say use a Trie, or better yet use its more space efficient cousin the Directed Acyclic Word Graph (DAWG).

It has the same runtime characteristics (insert, look up, delete) as a Trie but overlaps common suffixes as well as common prefixes which can be a big saving on space.

125

answered Sep 29 '22 11:09

luke

If this is C++, you should also consider std::tr1::unordered_set. (If you have C++0x, you can use std::unordered_set.)

This just uses a hash table internally, which I would wager will out-perform any tree-like structure in practice. It is also trivial to implement because you have nothing to implement.

answered Sep 29 '22 10:09

Nemo

Binary search is going to be easier to implement and it's only going to involve comparing tens of strings at the most. Given you know the data up front, you can build a balanced binary tree so performance is going to be predictable and easily understood.

With that in mind, I'd use a standard binary tree (probably using set from C++ since that's typically implemented as a tree).

answered Sep 29 '22 12:09

Jeff Foster

Related questions
                            
                                Recursion in sorting algorithms - always bad? [closed]
                            
                                For N equally sized arrays with integers in ascending order, how can I select the numbers common to arrays?
                            
                                Finding All Connected Components of an Undirected Graph
                            
                                String reverse operation best time complexity: Is it O(n) or O(n/2)?
                            
                                Where to find Python implementation of Chaikin's corner cutting algorithm?
                            
                                Using Perl, how can I build a dynamic regexp by passing in an argument to a subroutine?
                            
                                Can I identify a hash algorithm based on the initial key and output hash?
                            
                                Data structure with unique elements and fast add and remove
                            
                                Graph traversal of n steps
                            
                                Fermat's Last Theorem algorithm
                            
                                Find the unduplicated element in a sorted array
                            
                                Data Structure USed for Snake and Ladder game
                            
                                The Big O on the Dijkstra Fibonacci-heap solution
                            
                                O(n log m) vs O(n+m) - which is better?
                            
                                Multiplication of very long integers
                            
                                Variable Sized Arrays vs calloc in C
                            
                                Algorithms for Big O Analysis
                            
                                (Algorithm) Find if two unsorted arrays have any common elements in O(n) time without sorting?
                            
                                Which sort algorithms does PHP's usort apply?
                            
                                Confusion with FFT algorithm

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

trie or balanced binary search tree to store dictionary?

Tags:

dictionary

algorithm

data-structures

tree

xyz

People also ask

3 Answers

luke

Nemo

Jeff Foster

Recent Activity

Donate For Us