I am new to spacy and I want to use its lemmatizer function, but I don't know how to use it, like I into strings of word, which will return the string with the basic form the words.
Examples:
Thank you.
In Spacy, the process of tokenizing a text into segments of words and punctuation is done in various steps. It processes the text from left to right. First, the tokenizer split the text on whitespace similar to the split() function. Then the tokenizer checks whether the substring matches the tokenizer exception rules.
In order to lemmatize, you need to create an instance of the WordNetLemmatizer() and call the lemmatize() function on a single word. Let's lemmatize a simple sentence. We first tokenize the sentence into words using nltk. word_tokenize and then we will call lemmatizer.
It might be surprising to you but spaCy doesn't contain any function for stemming as it relies on lemmatization only.
Previous answer is convoluted and can't be edited, so here's a more conventional one.
# make sure your downloaded the english model with "python -m spacy download en" import spacy nlp = spacy.load('en') doc = nlp(u"Apples and oranges are similar. Boots and hippos aren't.") for token in doc: print(token, token.lemma, token.lemma_)
Output:
Apples 6617 apples and 512 and oranges 7024 orange are 536 be similar 1447 similar . 453 . Boots 4622 boot and 512 and hippos 98365 hippo are 536 be n't 538 not . 453 .
From the official Lighting tour
If you want to use just the Lemmatizer, you can do that in the following way:
from spacy.lemmatizer import Lemmatizer from spacy.lang.en import LEMMA_INDEX, LEMMA_EXC, LEMMA_RULES lemmatizer = Lemmatizer(LEMMA_INDEX, LEMMA_EXC, LEMMA_RULES) lemmas = lemmatizer(u'ducks', u'NOUN') print(lemmas)
Output
['duck']
Update
Since spacy version 2.2, LEMMA_INDEX, LEMMA_EXC, and LEMMA_RULES have been bundled into a Lookups
Object:
import spacy nlp = spacy.load('en') nlp.vocab.lookups >>> <spacy.lookups.Lookups object at 0x7f89a59ea810> nlp.vocab.lookups.tables >>> ['lemma_lookup', 'lemma_rules', 'lemma_index', 'lemma_exc']
You can still use the lemmatizer directly with a word and a POS (part of speech) tag:
from spacy.lemmatizer import Lemmatizer, ADJ, NOUN, VERB lemmatizer = nlp.vocab.morphology.lemmatizer lemmatizer('ducks', NOUN) >>> ['duck']
You can pass the POS tag as the imported constant like above or as string:
lemmatizer('ducks', 'NOUN') >>> ['duck']
from spacy.lemmatizer import Lemmatizer, ADJ, NOUN, VERB
Code :
import os
from spacy.en import English, LOCAL_DATA_DIR
data_dir = os.environ.get('SPACY_DATA', LOCAL_DATA_DIR)
nlp = English(data_dir=data_dir)
doc3 = nlp(u"this is spacy lemmatize testing. programming books are more better than others")
for token in doc3:
print token, token.lemma, token.lemma_
Output :
this 496 this
is 488 be
spacy 173779 spacy
lemmatize 1510965 lemmatize
testing 2900 testing
. 419 .
programming 3408 programming
books 1011 book
are 488 be
more 529 more
better 615 better
than 555 than
others 871 others
Example Ref: here
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With