I have a string. I want to apply lemmatization on it.
str = "i want better dogs"
str = str.split(" ")
for w in str: wordnet_lemmatizer.lemmatize(w)
I'm getting output:
i want better dog
When i am running this loop:
for w in str:
wordnet_lemmatizer.lemmatize(w,pos='a')
I'm getting this output:
i want good dogs
Blockquote
What i want is the combination of both the loops. i.e ** i want good dog
**
how can i do it? Is there a way to add multiple pos like v(verb),n(noun)
?
What you want, is to first run a POS tagger on your text to find out the parts-of-speech, and then lemmatize accordingly.
POS tag using nltk.tag
:
>>> from nltk.tag import pos_tag
>>> from nltk.tokenize import word_tokenize
>>> pos_tag(word_tokenize("i want better dogs"))
[('i', 'NN'), ('want', 'VBP'), ('better', 'JJR'), ('dogs', 'NNS')]
Then you can check whether a tag starts with NN
, JJ
, or VB
, and ignore all other tags:
from nltk.stem import WordNetLemmatizer
def lemmatize_all(sentence):
wnl = WordNetLemmatizer()
for word, tag in pos_tag(word_tokenize(sentence)):
if tag.startswith("NN"):
yield wnl.lemmatize(word, pos='n')
elif tag.startswith('VB'):
yield wnl.lemmatize(word, pos='v')
elif tag.startswith('JJ'):
yield wnl.lemmatize(word, pos='a')
else:
yield word
print(' '.join(lemmatize_all("i want better dogs")))
# prints 'i want good dog'
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With