Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Auto correct algorithm

I want to implement the following in C++:

1) Check whether the given word exists in a dictionary. The dictionary file is a huge file; consider 100MB or 3-4 Million words.

2) Suggest corrections for the incorrect word.

3) Autocomplete feature.

My Approach

1) I am planning to build a tree so searching will efficient.

2) I am not getting how to implement auto correction feature.

3) I can implement auto complete feature using trees

My tree Image

What's the best data structure and algorithm to implement all the above features?

like image 684
Ankith Avatar asked Dec 11 '13 07:12

Ankith


People also ask

What algorithm is used for AutoCorrect?

The basic algorithm behind autocorrection software like T9 is pretty simple. The system is essentially the same as a word processor's spell checker—as you type, the software checks each word against a built-in dictionary, and it suggests alternatives when it doesn't find a match.

What is AutoCorrect based on?

Auto-correct is a feature available on Android and Apple smartphones, PCs and applications like Outlook. Its goal is to make typing a little easier by automatically changing words, letters or punctuation it thinks you entered incorrectly.

Does auto-correct use AI?

Inside The AI: how we build our autocorrect, emoji and text prediction engine. Although it's Typewise's unique hexagonal keyboard that tends to catch people's eye at first, it's our advanced autocorrect and text prediction AI under the hood that provides the power behind our product.


2 Answers

I have been working on the same problem. So far the best solution I have come across is using a ternary search tree for auto completion. Ternary Search Trees are more space efficient than tries. If im unable to find the looked up string in my ternary search tree then I use an already built BK Tree for finding the closest match. BK Tree internally uses Levenshtein distance. You

Metaphones are also something you might want to explore however I havent gone into the depth of metaphones.

I have a solution in Java for BK TREE + TERNARY SEARCH TREE if you like.

like image 91
Pawan Avatar answered Oct 04 '22 10:10

Pawan


You can do autocomplete by looking at all the strings in a given subtree. Some score to help you pick might help. This works something like if you have "te" you go down that path in the trie and the traverse the entire subtree there to get all the possible endings.

For corrections you need to implement something like http://en.wikipedia.org/wiki/Levenshtein_distance over the tree. You can use the fact that if you processed a given path in the trie, you can reuse the result for all the strings in the subtree rooted at the end of your path.

like image 24
Sorin Avatar answered Oct 04 '22 08:10

Sorin