I am using python pattern to get the singular form of English nouns.
In [1]: from pattern.en import singularize
In [2]: singularize('patterns')
Out[2]: 'pattern'
In [3]: singularize('gases')
Out[3]: 'gase'
I am solving the problem in the second example by defining
def my_singularize(strn):
'''
Return the singular of a noun. Add special cases to correct pattern generic rules.
'''
exceptionDict = {'gases':'gas','spectra':'spectrum','cross':'cross','nuclei':'nucleus'}
try:
return exceptionDict[strn]
except:
return singularize(strn)
Is there a better way to do this, e.g. add to the rules of pattern, or make the exceptionDict
somehow internal to pattern?
As mentioned in the comments, you would be better off by lemmatizing the words. Its part of nltk stemming module.
from nltk.stem import WordNetLemmatizer
wnl = WordNetLemmatizer()
test_words = ['gases', 'spectrum','cross','nuclei']
%timeit [wnl.lemmatize(wrd) for wrd in test_words]
10000 loops, best of 3: 60.5 µs per loop
compared to your function
%timeit [my_singularize(wrd) for wrd in test_words]
1000 loops, best of 3: 162 µs per loop
nltk lemmatizing performs better.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With