Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Spelling libraries (like hunspell) in UWP Applications?

I am porting an application for writers to the UWP platorm. The only piece of the puzzle i have left is the NHunspell library. I use it extensively for spell checking and thesaurus features. I've customized the heck out of it, and created custom dictionaries for various things (i.e. a different dictionary for each writing project). This library is a beautiful thing.

However, I can't seem to include this DLL in my UWP application.

1) Is there a way to force the usage of this DLL? I really do like how the NHunSpell system is set up. It makes common sense and is very fast and easy to use.

2) If not, can anyone recommend a better solution for custom dictionaries, customized spell checking, etc?

Update 3

After considerable update and reading online, I found a link discussing the theory of spell checking. Here is one quick example (the one I used the most).

http://www.anotherchris.net/csharp/how-to-write-a-spelling-corrector-in-csharp/

After reading this article, taking that base code, and stripping the English words from the Hunspell .dic files, I have created my own spell-checking library that works in UWP.

Once I get it solidified, I will post it as an answer below to donate to the SO community. :)

Update 2

I'm conceding the use of Hunspell. It doesn't look like it is possible at all... are there any other libraries/packages that anyone can suggest?

UPDATE :

I probably need to rephrase the statement that I can't include the DLL: I cannot include the DLL through NuGet. It complains that the DLL is not compatible with the UAP/UWP platform.

I am able to MANUALLY include the DLL in my project by linking to an existing DLL (not NuGet). However, that DLL does indeed prove to be incompatible with the UAP platform. A simple call to spellcheck a word works fine in WinForms, but immediately crashes with System.IO.FileNotFoundException.

The constructor of NHunspell does reach out to load the associated .dic and .aff files. However, I have mitigated this by loading the files into memory and then call the alternate constructor which takes a byte array instead of a file name for each of those files. It still crashes, but with a new Method not found error:

String System.AppDomain.get_RelativeSearchPath()

I am looking for any spell checking engine that will work within the UAP framework. I would prefer for it to be NHunspell simply for familiarity reasons. However, I'm not blind to the fact that this is becoming increasingly less-possible as an option.

People I work with have suggested that I use the built-in spellchecking options. However, I can't use the built-in Windows 10/TextBox spell checking features (that I know of), because I can't control custom dictionaries and I can't disable things like auto-capitalize and word-replacement (where it replaces the word for you if it thinks it is close enough to the right guess). Those things are chapter-suicide for writers! A writer can turn them off at the OS level, but they may want them on for other apps, just not this one.

Please let me know if there is a work-around for NHunspell. And if you don't know of a work-around, please let me know your best replacement custom spellcheck engine that works within the UAP framework.

As a side note, I also use NHunspell for its thesaurus capability. It works very well in my windows apps. I would also have to replace this functionality as well -- hopefully with the same engine as the spellcheck engine. However, if you know of a good thesaurus engine (but it doesn't spell check), that's good too!

Thank you!!

like image 312
Jerry Avatar asked Mar 18 '16 00:03

Jerry


2 Answers

I download the source code of NHunspell library and I tried to build a library with UWP support, however I found problems with the Marshalling (Marshalling.cs)
The package loads dlls that only working in x86 and x64 architecture, so in arm (mobiles, tablets) the app will not work.
The package loads dlls with system calls:

    [DllImport("kernel32.dll")]
    internal static extern IntPtr LoadLibrary(string fileName);

and I think that it needs to be rewrite for working in UWP, because UWP uses a sandboxing.

IMHO there are only two options:
1) Rewrite the Marshalling class with the restrictions of UWP.
2) Not use Hunspell in your program.

I don't have a large knowledge about dlls with UWP, but I believe that the rewrite could be very difficult.

like image 158
ganchito55 Avatar answered Nov 01 '22 14:11

ganchito55


As promised, here is the class I built to do my spell checking.

using System;
using System.Collections.Generic;
using System.IO;
using System.Linq;
using System.Text;
using System.Text.RegularExpressions;
using System.Threading.Tasks;

namespace Com.HanelDev.HSpell
{
    public class HSpellProcess
    {
        private Dictionary<string, string> _dictionary = new Dictionary<string, string>();

        public int MaxSuggestionResponses { get; set; }

        public HSpellProcess()
        {
            MaxSuggestionResponses = 10;
        }

        public void AddToDictionary(string w)
        {
            if (!_dictionary.ContainsKey(w.ToLower()))
            {
                _dictionary.Add(w.ToLower(), w);
            }
            else
            {
                // Upper case words are more specific (but may be the first word
                // in a sentence.) Lower case words are more generic.
                // If you put an upper-case word in the dictionary, then for
                // it to be "correct" it must match case. This is not true
                // for lower-case words.
                // We want to only replace existing words with their more
                // generic versions, not the other way around.
                if (_dictionary[w.ToLower()].CaseSensitive())
                {
                    _dictionary[w.ToLower()] = w;
                }
            }
        }

        public void LoadDictionary(byte[] dictionaryFile, bool resetDictionary = false)
        {
            if (resetDictionary)
            {
                _dictionary = new Dictionary<string, string>();
            }
            using (MemoryStream ms = new MemoryStream(dictionaryFile))
            {
                using (StreamReader sr = new StreamReader(ms))
                {
                    string tmp = sr.ReadToEnd();
                    tmp = tmp.Replace("\r\n", "\r").Replace("\n", "\r");
                    string [] fileData = tmp.Split("\r".ToCharArray());

                    foreach (string line in fileData)
                    {
                        if (string.IsNullOrWhiteSpace(line) || line.StartsWith("#"))
                        {
                            continue;
                        }

                        string word = line;

                        // I added all of this for file imports (not array imports)
                        // to be able to handle words from Hunspell dictionaries.
                        // I don't get the hunspell derivatives, but at least I get
                        // the root word.
                        if (line.Contains("/"))
                        {
                            string[] arr = line.Split("/".ToCharArray());
                            word = arr[0];
                        }

                        AddToDictionary(word);
                    }
                }
            }
        }

        public void LoadDictionary(Stream dictionaryFileStream, bool resetDictionary = false)
        {
            string s = "";
            using (StreamReader sr = new StreamReader(dictionaryFileStream))
            {
                s = sr.ReadToEnd();
            }

            byte [] bytes = Encoding.UTF8.GetBytes(s);

            LoadDictionary(bytes, resetDictionary);
        }

        public void LoadDictionary(List<string> words, bool resetDictionary = false)
        {
            if (resetDictionary)
            {
                _dictionary = new Dictionary<string, string>();
            }

            foreach (string line in words)
            {
                if (string.IsNullOrWhiteSpace(line) || line.StartsWith("#"))
                {
                    continue;
                }

                AddToDictionary(line);
            }
        }

        public string ExportDictionary()
        {
            StringBuilder sb = new StringBuilder();

            foreach (string k in _dictionary.Keys)
            {
                sb.AppendLine(_dictionary[k]);
            }

            return sb.ToString();
        }

        public HSpellCorrections Correct(string word)
        {
            HSpellCorrections ret = new HSpellCorrections();
            ret.Word = word;

            if (_dictionary.ContainsKey(word.ToLower()))
            {
                string testWord = word;
                string dictWord = _dictionary[word.ToLower()];
                if (!dictWord.CaseSensitive())
                {
                    testWord = testWord.ToLower();
                    dictWord = dictWord.ToLower();
                }

                if (testWord == dictWord)
                {
                    ret.SpelledCorrectly = true;
                    return ret;
                }
            }

            // At this point, we know the word is assumed to be spelled incorrectly. 
            // Go get word candidates.
            ret.SpelledCorrectly = false;

            Dictionary<string, HSpellWord> candidates = new Dictionary<string, HSpellWord>();

            List<string> edits = Edits(word);

            GetCandidates(candidates, edits);

            if (candidates.Count > 0)
            {
                return BuildCandidates(ret, candidates);
            }

            // If we didn't find any candidates by the main word, look for second-level candidates based on the original edits.
            foreach (string item in edits)
            {
                List<string> round2Edits = Edits(item);

                GetCandidates(candidates, round2Edits);
            }

            if (candidates.Count > 0)
            {
                return BuildCandidates(ret, candidates);
            }

            return ret;
        }

        private void GetCandidates(Dictionary<string, HSpellWord> candidates, List<string> edits)
        {
            foreach (string wordVariation in edits)
            {
                if (_dictionary.ContainsKey(wordVariation.ToLower()) &&
                    !candidates.ContainsKey(wordVariation.ToLower()))
                {
                    HSpellWord suggestion = new HSpellWord(_dictionary[wordVariation.ToLower()]);

                    suggestion.RelativeMatch = RelativeMatch.Compute(wordVariation, suggestion.Word);

                    candidates.Add(wordVariation.ToLower(), suggestion);
                }
            }
        }

        private HSpellCorrections BuildCandidates(HSpellCorrections ret, Dictionary<string, HSpellWord> candidates)
        {
            var suggestions = candidates.OrderByDescending(c => c.Value.RelativeMatch);

            int x = 0;

            ret.Suggestions.Clear();
            foreach (var suggest in suggestions)
            {
                x++;
                ret.Suggestions.Add(suggest.Value.Word);

                // only suggest the first X words.
                if (x >= MaxSuggestionResponses)
                {
                    break;
                }
            }

            return ret;
        }

        private List<string> Edits(string word)
        {
            var splits = new List<Tuple<string, string>>();
            var transposes = new List<string>();
            var deletes = new List<string>();
            var replaces = new List<string>();
            var inserts = new List<string>();

            // Splits
            for (int i = 0; i < word.Length; i++)
            {
                var tuple = new Tuple<string, string>(word.Substring(0, i), word.Substring(i));
                splits.Add(tuple);
            }

            // Deletes
            for (int i = 0; i < splits.Count; i++)
            {
                string a = splits[i].Item1;
                string b = splits[i].Item2;
                if (!string.IsNullOrEmpty(b))
                {
                    deletes.Add(a + b.Substring(1));
                }
            }

            // Transposes
            for (int i = 0; i < splits.Count; i++)
            {
                string a = splits[i].Item1;
                string b = splits[i].Item2;
                if (b.Length > 1)
                {
                    transposes.Add(a + b[1] + b[0] + b.Substring(2));
                }
            }

            // Replaces
            for (int i = 0; i < splits.Count; i++)
            {
                string a = splits[i].Item1;
                string b = splits[i].Item2;
                if (!string.IsNullOrEmpty(b))
                {
                    for (char c = 'a'; c <= 'z'; c++)
                    {
                        replaces.Add(a + c + b.Substring(1));
                    }
                }
            }

            // Inserts
            for (int i = 0; i < splits.Count; i++)
            {
                string a = splits[i].Item1;
                string b = splits[i].Item2;
                for (char c = 'a'; c <= 'z'; c++)
                {
                    inserts.Add(a + c + b);
                }
            }

            return deletes.Union(transposes).Union(replaces).Union(inserts).ToList();
        }

        public HSpellCorrections CorrectFrom(string txt, int idx)
        {
            if (idx >= txt.Length)
            {
                return null;
            }

            // Find the next incorrect word.
            string substr = txt.Substring(idx);
            int idx2 = idx;

            List<string> str = substr.Split(StringExtensions.WordDelimiters).ToList();

            foreach (string word in str)
            {
                string tmpWord = word;

                if (string.IsNullOrEmpty(word))
                {
                    idx2++;
                    continue;
                }

                // If we have possessive version of things, strip the 's off before testing
                // the word. THis will solve issues like "My [mother's] favorite ring."
                if (tmpWord.EndsWith("'s"))
                {
                    tmpWord = word.Substring(0, tmpWord.Length - 2);
                }

                // Skip things like ***, #HashTagsThatMakeNoSense and 1,2345.67
                if (!tmpWord.IsWord())
                {
                    idx2 += word.Length + 1;
                    continue;
                }

                HSpellCorrections cor = Correct(tmpWord);

                if (cor.SpelledCorrectly)
                {
                    idx2 += word.Length + 1;
                }
                else
                {
                    cor.Index = idx2;
                    return cor;
                }
            }

            return null;
        }
    }
}
like image 1
Jerry Avatar answered Nov 01 '22 13:11

Jerry