I'm using NHunspell to check a string for spelling errors like so:
var words = content.Split(' ');
string[] incorrect;
using (var spellChecker = new Hunspell(affixFile, dictionaryFile))
{
incorrect = words.Where(x => !spellChecker.Spell(x))
.ToArray();
}
This generally works, but it has some problems. For example, if I'm checking the sentence "This is a (very good) example", it will report "(very" and "good)" as being misspelled. Or if the string contains a time such as "8:30", it will report that as a misspelled word. It also has problems with commas, etc.
Microsoft Word is smart enough to recognize a time, fraction, or comma-delimited list of words. It knows when not to use an English dictionary, and it knows when to ignore symbols. How can I get a similar, more intelligent spell check in my software? Are there any libraries that provide a little more intelligence?
EDIT: I don't want to force users to have Microsoft Word installed on their machine, so using COM interop is not an option.
Though common to the point that people take them for granted today, spell checkers were considered exciting research under the branch of artificial intelligence back in 1957. The first official spell checker application, not simply as research material, was created by Ralph Gorin and called Spell for the DEC PDP-10.
Enhanced spell check This spell check is used in Google Search. It sends the text that you enter in your browser to Google for improved spelling suggestions. In some operating systems, you can update custom words in the spell check dictionary.
There is no single best spell-check for dyslexia. The choice depends on your situation and the facilities you want. You may prefer to use word prediction or speech recognition. A basic spell checker identifies words not in its dictionary and offers suggestions.
The enhanced spell checker uses the cloud-based spell checker that is used in Google search to help correct typos in searches. This option can be more effective at identifying misspelled words as it has far more data to work with.
If your spell checker is really that stupid, you should pre-tokenize its input to get the words out and feed those one at a time (or as a string joined with spaces). I'm not familiar with C#/.NET, but in Python, you'd use a simple RE like \w+
for that:
>>> s = "This is a (very good) example"
>>> re.findall(r"\w+", s)
['This', 'is', 'a', 'very', 'good', 'example']
and I bet .NET has something very similar. In fact, according to the .NET docs, \w
is supported, so you just have to find out how re.findall
is called there.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With