Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

What is a good algorithm to traverse a Trie to check for spelling suggestions?

Assuming that a general Trie of dictionary words is built, what would be the best method to check for the 4 cases of spelling mistakes - substitution, deletion, transposition and insertion during traversal?

One method is to figure out all the words within n edit distances of a given word and then checking for them in the Trie. This isn't a bad option, but a better intuition here seems to be use a dynamic programming (or a recursive equivalent) method to determine the best sub-tries after having modified the words during traversal.

Any ideas would be welcome!

PS, would appreciate actual inputs rather than just links in answers.

like image 553
viksit Avatar asked Jul 14 '10 22:07

viksit


People also ask

What algorithm does spell check use?

Spell checkers can use approximate string matching algorithms such as Levenshtein distance to find correct spellings of misspelled words. An alternative type of spell checker uses solely statistical information, such as n-grams, to recognize errors instead of correctly-spelled words.

How do you check for spelling errors?

On the Review tab, click Spelling & Grammar. If Word finds a potential error, the Spelling & Grammar dialog box will open, spelling errors will be shown as red text, and grammatical errors will be shown as green text. To fix an error, do one of the following: Type the correction in the box and then click Change.

Which tool is used to find and correct mistakes in your spelling?

GrammarlyThe suggestion tool allows finding mistakes and choosing the correct spelling of a particular word.

How spelling correction is done in NLP?

Automatic spelling correction is important for many NLP applications like web search engines, text summarization, sentiment analysis etc. Most approaches use parallel data of noisy and correct word mappings from different sources as training data for automatic spelling correction.


2 Answers

I actually wrote some code to do this the other day:

https://bitbucket.org/teoryn/spell-checker/src/tip/spell_checker.py

It's based on the code by Peter Norvig (http://norvig.com/spell-correct.html) but stores the dictionary in a trie for finding words within a given edit distance faster.

The algorithm walks the trie recursively applying the possible edits (or not) at each step along the way by consuming letters from the input word. A parameter to the recursive call states how many more edits can be made. The trie helps narrow the search space by checking which letters can actually be reached from our given prefix. For example, when inserting a character, instead of adding each letter in the alphabet, we only add letters that are reachable from the current node. Not making an edit is equivalent to taking the branch from the current node in the trie along the current letter from the input word. If that branch is not there then we can backtrack and avoid searching a possibly large space where no real words could be found.

like image 132
Kevin Stock Avatar answered Sep 22 '22 17:09

Kevin Stock


I think you can do this with a straightforward breadth-first search on the tree: choose a threshold of the number of errors you are looking for, simply run through the letters of the word to be matched one at a time, generating a set of (prefix, subtrie) pairs reached so far matching the prefix, and while you are beneath your error threshold, add to your set of next subgoals:

  1. No error at this character place: add the subgoal of the trie at the next character in the word
  2. An inserted, deleted, or substituted character at this place: find the appropriate trie there, and increment the error count;
  3. Not an additional goal, but note that transpositions are either an insertion or deletion that matches an earlier deletion or insertion: if this test hold, then don't increment the error count.

This seems pretty naive: is there a problem with this that led you to think of dynamic programming?

like image 38
Charles Stewart Avatar answered Sep 21 '22 17:09

Charles Stewart