How to stop NLTK stemmer from removing the trailing "e"?

Tags:

I'm using NLTK stemmer to remove grammatical variations of a stem word. However, the Port or Snowball stemmers remove the trailing "e" of the original form of a noun or verb, e.g., Profile becomes Profil.

How can I prevent this from happening? I know I can use a conditional to guard against this. But obviously it will fail on different cases.

Is there an option or another API for what I want?

832

asked Jul 01 '14 19:07

kakyo

1 Answers

I agree with Philip that the goal of stemmer is to retain only the stem. For this particular case you can try a lemmatizer instead of stemmer which will supposedly retain more of a word and is meant to remove exactly different forms of a word like 'profiles' --> 'profile'. There is a class in NLTK for this - try WordNetLemmatizer() from nltk.stem.

Beware that it's still not perfect (like nothing when working with text) because I used to get 'physic' from 'physics'.

answered Nov 14 '22 22:11

Everst

Related questions
                            
                                difference between dict(groupby) and groupby [duplicate]
                            
                                How to import static library in python? [duplicate]
                            
                                python pandas - replace number with string
                            
                                Using NumPy to convert user/item ratings into 2-D array
                            
                                color codes in reportlabs-python
                            
                                How to collect data from a list into groups based on condition?
                            
                                Most Pythonic for / enumerate loop?
                            
                                Using python opencv to load image from zip
                            
                                Using Flask-Mail asynchronously results in "RuntimeError: working outside of application context"
                            
                                How to uninstall manually openerp module
                            
                                How to input a word in ncurses screen?
                            
                                Python - Calculate histogram of image
                            
                                Identifying the data type of an input
                            
                                How do I remove identical items from a list and sort it in Python?
                            
                                Vim: Highlight a Single Character at Column 80 [duplicate]
                            
                                Nested Django tags
                            
                                cv2.threshold() error (-210)
                            
                                WinError 10049: The requested address is not valid in its context
                            
                                pygame: current time millis and delta time
                            
                                Numpy: get values from array where indices are in another array

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

How to stop NLTK stemmer from removing the trailing "e"?

Tags:

python

nlp

nltk

kakyo

People also ask

1 Answers

Everst

Recent Activity

Donate For Us