I am new to spacy and I want to use its lemmatizer function, but I don't know how to use it, like I into strings of word, which will return the string with the basic form the words. Examples: <ul> <li>'words'=> 'word'</li> <li>'did' => 'do'</li> </ul> Thank you.

If you want to use just the Lemmatizer, you can do that in the following way: <pre class="prettyprint"><code>from spacy.lemmatizer import Lemmatizer from spacy.lang.en import LEMMA_INDEX, LEMMA_EXC, LEMMA_RULES lemmatizer = Lemmatizer(LEMMA_INDEX, LEMMA_EXC, LEMMA_RULES) lemmas = lemmatizer(u'ducks', u'NOUN') print(lemmas) </code></pre> Output <pre class="prettyprint"><code>['duck'] </code></pre> Update Since spacy version 2.2, LEMMA_INDEX, LEMMA_EXC, and LEMMA_RULES have been bundled into a <code>Lookups</code> Object: <pre class="prettyprint"><code>import spacy nlp = spacy.load('en') nlp.vocab.lookups >>> <spacy.lookups.Lookups object at 0x7f89a59ea810> nlp.vocab.lookups.tables >>> ['lemma_lookup', 'lemma_rules', 'lemma_index', 'lemma_exc'] </code></pre> You can still use the lemmatizer directly with a word and a POS (part of speech) tag: <pre class="prettyprint"><code>from spacy.lemmatizer import Lemmatizer, ADJ, NOUN, VERB lemmatizer = nlp.vocab.morphology.lemmatizer lemmatizer('ducks', NOUN) >>> ['duck'] </code></pre> You can pass the POS tag as the imported constant like above or as string: <pre class="prettyprint"><code>lemmatizer('ducks', 'NOUN') >>> ['duck'] </code></pre> from spacy.lemmatizer import Lemmatizer, ADJ, NOUN, VERB

Code : <pre class="prettyprint"><code>import os from spacy.en import English, LOCAL_DATA_DIR data_dir = os.environ.get('SPACY_DATA', LOCAL_DATA_DIR) nlp = English(data_dir=data_dir) doc3 = nlp(u"this is spacy lemmatize testing. programming books are more better than others") for token in doc3: print token, token.lemma, token.lemma_ </code></pre> Output : <pre class="prettyprint"><code>this 496 this is 488 be spacy 173779 spacy lemmatize 1510965 lemmatize testing 2900 testing . 419 . programming 3408 programming books 1011 book are 488 be more 529 more better 615 better than 555 than others 871 others </code></pre> Example Ref: here

how to use spacy lemmatizer to get a word into basic form

3 Answers

Previous answer is convoluted and can't be edited, so here's a more conventional one.

# make sure your downloaded the english model with "python -m spacy download en"  import spacy nlp = spacy.load('en')  doc = nlp(u"Apples and oranges are similar. Boots and hippos aren't.")  for token in doc:     print(token, token.lemma, token.lemma_)

Output:

Apples 6617 apples and 512 and oranges 7024 orange are 536 be similar 1447 similar . 453 . Boots 4622 boot and 512 and hippos 98365 hippo are 536 be n't 538 not . 453 .

From the official Lighting tour

196

answered Sep 23 '22 03:09

damio

If you want to use just the Lemmatizer, you can do that in the following way:

from spacy.lemmatizer import Lemmatizer from spacy.lang.en import LEMMA_INDEX, LEMMA_EXC, LEMMA_RULES  lemmatizer = Lemmatizer(LEMMA_INDEX, LEMMA_EXC, LEMMA_RULES) lemmas = lemmatizer(u'ducks', u'NOUN') print(lemmas)

Output

['duck']

Update

Since spacy version 2.2, LEMMA_INDEX, LEMMA_EXC, and LEMMA_RULES have been bundled into a Lookups Object:

import spacy nlp = spacy.load('en')  nlp.vocab.lookups >>> <spacy.lookups.Lookups object at 0x7f89a59ea810> nlp.vocab.lookups.tables >>> ['lemma_lookup', 'lemma_rules', 'lemma_index', 'lemma_exc']

You can still use the lemmatizer directly with a word and a POS (part of speech) tag:

from spacy.lemmatizer import Lemmatizer, ADJ, NOUN, VERB  lemmatizer = nlp.vocab.morphology.lemmatizer lemmatizer('ducks', NOUN) >>> ['duck']

You can pass the POS tag as the imported constant like above or as string:

lemmatizer('ducks', 'NOUN') >>> ['duck']

from spacy.lemmatizer import Lemmatizer, ADJ, NOUN, VERB

answered Sep 23 '22 03:09

joel

Code :

import os
from spacy.en import English, LOCAL_DATA_DIR

data_dir = os.environ.get('SPACY_DATA', LOCAL_DATA_DIR)

nlp = English(data_dir=data_dir)

doc3 = nlp(u"this is spacy lemmatize testing. programming books are more better than others")

for token in doc3:
    print token, token.lemma, token.lemma_

Output :

this 496 this
is 488 be
spacy 173779 spacy
lemmatize 1510965 lemmatize
testing 2900 testing
. 419 .
programming 3408 programming
books 1011 book
are 488 be
more 529 more
better 615 better
than 555 than
others 871 others

Example Ref: here

answered Sep 23 '22 03:09

RAVI

Related questions
                            
                                BeautifulSoup: object of type 'Response' has no len()
                            
                                What are the main differences of NamedTuple and TypedDict in Python / mypy
                            
                                Python appending array to an array
                            
                                Differences in SciKit Learn, Keras, or Pytorch [closed]
                            
                                Finding Signed Angle Between Vectors
                            
                                How to convert integer value to array of four bytes in python
                            
                                Numpy list of 1D Arrays to 2D Array
                            
                                Plural String Formatting
                            
                                How to test if an Enum member with a certain name exists?
                            
                                How do I do exponentiation in python? [duplicate]
                            
                                ImportError: bad magic number in 'time': b'\x03\xf3\r\n' in Django
                            
                                Extract string from between quotations
                            
                                How to replace the some characters from the end of a string?
                            
                                Spyder Python "object arrays are currently not supported"
                            
                                String formatting in Python version earlier than 2.6
                            
                                How to extract elements from a list using indices in Python? [duplicate]
                            
                                Counting instances of a class?
                            
                                What is the fastest way to generate image thumbnails in Python?
                            
                                Python : When is a variable passed by reference and when by value? [duplicate]
                            
                                First non-null value per row from a list of Pandas columns

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

how to use spacy lemmatizer to get a word into basic form

Tags:

python

nltk

lemmatization

spacy

yi wang

People also ask

3 Answers

damio

joel

RAVI

Recent Activity

Donate For Us