Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Intelligent spell checking

I'm using NHunspell to check a string for spelling errors like so:

var words = content.Split(' ');
string[] incorrect;
using (var spellChecker = new Hunspell(affixFile, dictionaryFile))
{
    incorrect = words.Where(x => !spellChecker.Spell(x))
        .ToArray();
}

This generally works, but it has some problems. For example, if I'm checking the sentence "This is a (very good) example", it will report "(very" and "good)" as being misspelled. Or if the string contains a time such as "8:30", it will report that as a misspelled word. It also has problems with commas, etc.

Microsoft Word is smart enough to recognize a time, fraction, or comma-delimited list of words. It knows when not to use an English dictionary, and it knows when to ignore symbols. How can I get a similar, more intelligent spell check in my software? Are there any libraries that provide a little more intelligence?

EDIT: I don't want to force users to have Microsoft Word installed on their machine, so using COM interop is not an option.

like image 949
Phil Avatar asked Mar 09 '12 17:03

Phil


People also ask

Is spell check artificial intelligence?

Though common to the point that people take them for granted today, spell checkers were considered exciting research under the branch of artificial intelligence back in 1957. The first official spell checker application, not simply as research material, was created by Ralph Gorin and called Spell for the DEC PDP-10.

What is enhanced spell check?

Enhanced spell check This spell check is used in Google Search. It sends the text that you enter in your browser to Google for improved spelling suggestions. In some operating systems, you can update custom words in the spell check dictionary.

Is there a spell check for dyslexia?

There is no single best spell-check for dyslexia. The choice depends on your situation and the facilities you want. You may prefer to use word prediction or speech recognition. A basic spell checker identifies words not in its dictionary and offers suggestions.

Is enhanced spell check better?

The enhanced spell checker uses the cloud-based spell checker that is used in Google search to help correct typos in searches. This option can be more effective at identifying misspelled words as it has far more data to work with.


1 Answers

If your spell checker is really that stupid, you should pre-tokenize its input to get the words out and feed those one at a time (or as a string joined with spaces). I'm not familiar with C#/.NET, but in Python, you'd use a simple RE like \w+ for that:

>>> s = "This is a (very good) example"
>>> re.findall(r"\w+", s)
['This', 'is', 'a', 'very', 'good', 'example']

and I bet .NET has something very similar. In fact, according to the .NET docs, \w is supported, so you just have to find out how re.findall is called there.

like image 175
Fred Foo Avatar answered Sep 28 '22 02:09

Fred Foo