Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How do I use a Trie for spell checking

I have a trie that I've built from a dictionary of words. I want to use this for spell checking( and suggest closest matches in the dictionary , maybe for a given number of edits x). I'm thinking I'd use levenshtein distance between the target word and words in my dictionary, but is there a smart way to traverse the trie without actually running the edit distance logic over each word separately? How should I do the traversal and the edit distance matching?

For e.g, if I have words MAN, MANE, I should be able to reuse the edit distance computation on MAN in MANE. Otherwise the Trie wouldnt serve any purpose

like image 550
Aks Avatar asked Jan 26 '14 17:01

Aks


People also ask

How do you trigger a spell check?

Here's how. Click File > Options > Proofing, clear the Check spelling as you type box, and click OK. To turn spell check back on, repeat the process and select the Check spelling as you type box. To check spelling manually, click Review > Spelling & Grammar.

What is the best free spell check app?

Grammarly Keyboard is the app's mobile keyboard extension that works with both iOS and Android devices. If you're using Grammarly for basic grammar checking, you'll benefit the most from the free version.

What algorithm does spell check use?

Spell checkers can use approximate string matching algorithms such as Levenshtein distance to find correct spellings of misspelled words. An alternative type of spell checker uses solely statistical information, such as n-grams, to recognize errors instead of correctly-spelled words.


2 Answers

I think you should instead give a try to bk-trees; it's a data structure that fits well spell-checking as it will allow you to compute efficiently the edit distance with the words of your dictionary.

This link gives a good insight into BK-trees applied to spell-checking

like image 82
pierroz Avatar answered Nov 01 '22 19:11

pierroz


Try computing for each tree node an array A where A[x] the smallest edit distance to be at that position in the trie after matching the first x letters of the target word.

You can then stop examining any nodes if every element in the array is greater than your target distance.

For example, with a trie containing MAN and MANE and an input BANE:

Node 0 representing '', A=[0,1,2,3,4]
Node 1 representing 'M', A=[1,1,2,3,4]
Node 2 representing 'MA', A=[2,1,1,2,3]
Node 3 representing 'MAN' A=[3,2,2,1,2]
Node 4 representing 'MANE' A=[4,3,2,2,1]

The smallest value for A[end] is 1 reached with the word 'MANE' so this is the best match.

like image 22
Peter de Rivaz Avatar answered Nov 01 '22 18:11

Peter de Rivaz