Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to inverse lemmatization process given a lemma and a token?

Generally, in natural language processing, we want to get the lemma of a token.

For example, we can map 'eaten' to 'eat' using wordnet lemmatization.

Is there any tools in python that can inverse lemma to a certain form?

For example, we map 'go' to 'gone' given target form 'eaten'.

PS: Someone mentions we have to store such mappings. How to un-stem a word in Python?

like image 302
Shifeng.Liu Avatar asked Aug 09 '17 12:08

Shifeng.Liu


1 Answers

Turning a base form such as a lemma into a situation-appropriate form is called realization (or "surface realization"). Example from Wikipedia:

NPPhraseSpec subject = nlgFactory.createNounPhrase("the", "woman");
subject.setPlural(true);
SPhraseSpec sentence = nlgFactory.createClause(subject, "smoke");
sentence.setFeature(Feature.NEGATED, true);
System.out.println(realiser.realiseSentence(sentence));
// output: "The women do not smoke."

Libraries for this are not as frequently used as lemmatizers, which generally means you have fewer options and are less likely to find a well developed library. The Wikipedia example is in Java because the most popular library supporting this is SimpleNLG.

A quick search found pynlg, though it doesn't seem actively maintained. Alternately you can use SimpleNLG via an HTTP JSON interface provided by the Python library nlgserv.

like image 163
polm23 Avatar answered Sep 21 '22 07:09

polm23