What is a good algorithm to traverse a Trie to check for spelling suggestions?

Tags:

Assuming that a general Trie of dictionary words is built, what would be the best method to check for the 4 cases of spelling mistakes - substitution, deletion, transposition and insertion during traversal?

One method is to figure out all the words within n edit distances of a given word and then checking for them in the Trie. This isn't a bad option, but a better intuition here seems to be use a dynamic programming (or a recursive equivalent) method to determine the best sub-tries after having modified the words during traversal.

Any ideas would be welcome!

PS, would appreciate actual inputs rather than just links in answers.

553

asked Jul 14 '10 22:07

viksit

2 Answers

I actually wrote some code to do this the other day:

https://bitbucket.org/teoryn/spell-checker/src/tip/spell_checker.py

It's based on the code by Peter Norvig (http://norvig.com/spell-correct.html) but stores the dictionary in a trie for finding words within a given edit distance faster.

The algorithm walks the trie recursively applying the possible edits (or not) at each step along the way by consuming letters from the input word. A parameter to the recursive call states how many more edits can be made. The trie helps narrow the search space by checking which letters can actually be reached from our given prefix. For example, when inserting a character, instead of adding each letter in the alphabet, we only add letters that are reachable from the current node. Not making an edit is equivalent to taking the branch from the current node in the trie along the current letter from the input word. If that branch is not there then we can backtrack and avoid searching a possibly large space where no real words could be found.

132

answered Sep 22 '22 17:09

Kevin Stock

I think you can do this with a straightforward breadth-first search on the tree: choose a threshold of the number of errors you are looking for, simply run through the letters of the word to be matched one at a time, generating a set of (prefix, subtrie) pairs reached so far matching the prefix, and while you are beneath your error threshold, add to your set of next subgoals:

No error at this character place: add the subgoal of the trie at the next character in the word
An inserted, deleted, or substituted character at this place: find the appropriate trie there, and increment the error count;
Not an additional goal, but note that transpositions are either an insertion or deletion that matches an earlier deletion or insertion: if this test hold, then don't increment the error count.

This seems pretty naive: is there a problem with this that led you to think of dynamic programming?

answered Sep 21 '22 17:09

Charles Stewart

Related questions
                            
                                Brent's cycle detection algorithm
                            
                                Given a number, find whether it is brilliant or not
                            
                                In-order traversal complexity in a binary search tree (using iterators)?
                            
                                Finding minimum cost in a binary matrix
                            
                                Number of Distinct Subarrays
                            
                                Graph Implementations: why not use hashing?
                            
                                python prime factorization performance
                            
                                Scheduling Algorithm with limitations
                            
                                Time complexity of the word break recursive solution?
                            
                                Members of two groups of different size must meet each other (1v1, once)
                            
                                Efficient data structure for storing a long sequence of (mostly consecutive) integers
                            
                                Monitor brands with common words
                            
                                Josephus sequence
                            
                                Efficiently find order statistics of unsorted list prefixes?
                            
                                Distance between 2 hexagons on hexagon grid
                            
                                packing algorithm in rtree in boost
                            
                                Maximum cost of traversal in matrix using dynamic programming
                            
                                Do not understand the solution for the Binary Tree Maximum Path Sum problem
                            
                                Sort array in ascending order while minimizing "cost"
                            
                                How CSS and DOM is implemented in the browser?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

What is a good algorithm to traverse a Trie to check for spelling suggestions?

Tags:

algorithm

spell-checking

dynamic-programming

trie

viksit

People also ask

2 Answers

Kevin Stock

Charles Stewart

Recent Activity

Donate For Us