Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to obtain all possible words from given hunspell dictionary?

I would like to parse open office supporting hunspell formatted aff and dic files.

English aff and dic files can be downloaded from here for example : http://extensions.openoffice.org/en/project/english-dictionaries-apache-openoffice

I want to scan each line of the given .dic file and generate every possible word of the each line with the provided .aff file

How can i do that?

I have installed NHunspell framework but it does not have that feature : https://www.nuget.org/packages/NHunspell/

For example for the english language lets consider

make/UAGS

make can be make, made, makes, making etc

Now i need parser to give me all these combinations. How can i obtain them? Ty very much

So basically i want to scan each line of the dictionary and generate all possible words from the word of that line and i dont know how can i do that

I can also write my own parsers, but it seems to me rules are pretty complex and there are no detailed and easy documentation about this

Here what i want basically. The image explains very clearly

Giving analyze/ADSG, en.dic and en.aff file and obtaining all the following words

analyze, analyzes, analyzing, analyzed, reanalyze, reanalyzes, reanalyzing, reanalyzed

enter image description here

like image 975
MonsterMMORPG Avatar asked Mar 02 '17 22:03

MonsterMMORPG


People also ask

How do I download the Hunspell dictionary?

You can download a dictionary by clicking on the respective language. Then add it to SoftMaker Office by choosing the ribbon command File | Options or the menu command Tools > Options, switching to the Language tab and clicking on the Hunspell dictionaries button.

What is Hunspell dictionary?

Hunspell is the spell checker library used by LibreOffice, OpenOffice, Mozilla Firefox, Google Chrome, Mac OS-X, InDesign, Opera, RStudio and many others. It provides a system for tokenizing, stemming and spelling in almost any language or alphabet.

How does Hunspell work?

The hunspell function is a high-level wrapper for finding spelling errors within a text document. It takes a character vector with text ( plain , latex , man , html or xml format), parses out the words and returns a list with incorrect words for each line.

How do you add a dictionary to Hunspell?

Add a Hunspell dictionaryDownload the required Spelling or Hyphenation dictionary. The downloaded file has an oxt extension (when downloaded from OpenOffice site). If you download it from a site other than OpenOffice, the extension can be different. Change the filename extension to zip.


1 Answers

If you want the entire database you may execute unmunch:

unmunch dictionary.dic dictionary.aff

Note that the current implementation of unmunch in hunspell has a limitation of maximum number of words, affs, and length of generated words. So, unmunch may fail if the target language is beyond the limits of unmunch.

If you want just the list of possible words that can be generated from an entry, you may use wordforms:

wordforms dictionary.aff dictionary.dic word
like image 86
Kartal Tabak Avatar answered Sep 30 '22 20:09

Kartal Tabak