Intelligent spell checking

Tags:

I'm using NHunspell to check a string for spelling errors like so:

var words = content.Split(' ');
string[] incorrect;
using (var spellChecker = new Hunspell(affixFile, dictionaryFile))
{
    incorrect = words.Where(x => !spellChecker.Spell(x))
        .ToArray();
}

This generally works, but it has some problems. For example, if I'm checking the sentence "This is a (very good) example", it will report "(very" and "good)" as being misspelled. Or if the string contains a time such as "8:30", it will report that as a misspelled word. It also has problems with commas, etc.

Microsoft Word is smart enough to recognize a time, fraction, or comma-delimited list of words. It knows when not to use an English dictionary, and it knows when to ignore symbols. How can I get a similar, more intelligent spell check in my software? Are there any libraries that provide a little more intelligence?

EDIT: I don't want to force users to have Microsoft Word installed on their machine, so using COM interop is not an option.

949

asked Mar 09 '12 17:03

Phil

1 Answers

If your spell checker is really that stupid, you should pre-tokenize its input to get the words out and feed those one at a time (or as a string joined with spaces). I'm not familiar with C#/.NET, but in Python, you'd use a simple RE like \w+ for that:

>>> s = "This is a (very good) example"
>>> re.findall(r"\w+", s)
['This', 'is', 'a', 'very', 'good', 'example']

and I bet .NET has something very similar. In fact, according to the .NET docs, \w is supported, so you just have to find out how re.findall is called there.

175

answered Sep 28 '22 02:09

Fred Foo

Related questions
                            
                                NUnit C# Test Project referencing other DLL
                            
                                Supporting Spanish as spoken in Latin America/Caribbean
                            
                                cancel background worker exception in e.result
                            
                                How do I get the filesize from the Microsoft.SharePoint.Client.File object?
                            
                                C# .NET 2 Threading.Timer - time drifting
                            
                                Libraries to verify file format by header
                            
                                Ensure property can't return null [closed]
                            
                                How to process long running requests with an HTTP handler under IIS?
                            
                                Extract JPEG from TIFF file
                            
                                How does launching an app in Compatibility Mode in Windows affect the app and how can I detect it?
                            
                                Project-Embedded IoC Container
                            
                                Handling Memory mapped File in C# directly from the memory
                            
                                What is the significance of System.CLSCompliantAttribute?
                            
                                Mixed mode assembly is built against old version of the runtime and cannot be loaded without additional configuration
                            
                                Visual C++ Stopwatch
                            
                                Static Constructor is called twice in the same appdomain?
                            
                                Grouping in condition is being dropped
                            
                                Extremely slow website on IIS
                            
                                What happened to Array.Sort() in .NET 4.0? Is TrySZSort() gone?
                            
                                WinRT and .NET clarification(s)

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Intelligent spell checking

Tags:

.net

artificial-intelligence

nlp

spell-checking

hunspell

Phil

People also ask

1 Answers

Fred Foo

Recent Activity

Donate For Us