Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

apply multiple pos argument in lemmatization

Tags:

python-3.x

I have a string. I want to apply lemmatization on it.

str = "i want better dogs"

str = str.split(" ") 
for w in str: wordnet_lemmatizer.lemmatize(w)

I'm getting output:

i want better dog

When i am running this loop:

for w in str: 
    wordnet_lemmatizer.lemmatize(w,pos='a')

I'm getting this output:

i want good dogs

Blockquote

What i want is the combination of both the loops. i.e ** i want good dog** how can i do it? Is there a way to add multiple pos like v(verb),n(noun) ?

like image 884
Shubham R Avatar asked Feb 06 '23 04:02

Shubham R


1 Answers

What you want, is to first run a POS tagger on your text to find out the parts-of-speech, and then lemmatize accordingly.

POS tag using nltk.tag:

>>> from nltk.tag import pos_tag
>>> from nltk.tokenize import word_tokenize
>>> pos_tag(word_tokenize("i want better dogs"))
[('i', 'NN'), ('want', 'VBP'), ('better', 'JJR'), ('dogs', 'NNS')]

Then you can check whether a tag starts with NN, JJ, or VB, and ignore all other tags:

from nltk.stem import WordNetLemmatizer

def lemmatize_all(sentence):
    wnl = WordNetLemmatizer()
    for word, tag in pos_tag(word_tokenize(sentence)):
        if tag.startswith("NN"):
            yield wnl.lemmatize(word, pos='n')
        elif tag.startswith('VB'):
            yield wnl.lemmatize(word, pos='v')
        elif tag.startswith('JJ'):
            yield wnl.lemmatize(word, pos='a')
        else:
            yield word

print(' '.join(lemmatize_all("i want better dogs")))
# prints 'i want good dog'
like image 171
L3viathan Avatar answered Mar 27 '23 16:03

L3viathan