Getting the basic form of the english word

Tags:

I am trying to get the basic english word for an english word which is modified from its base form. This question had been asked here, but I didnt see a proper answer, so I am trying to put it this way. I tried 2 stemmers and one lemmatizer from NLTK package which are porter stemmer, snowball stemmer, and wordnet lemmatiser.

I tried this code:

from nltk.stem.porter import PorterStemmer
from nltk.stem.snowball import SnowballStemmer
from nltk.stem.wordnet import WordNetLemmatizer

words = ['arrival','conclusion','ate']

for word in words:
    print "\n\nOriginal Word =>", word
    print "porter stemmer=>", PorterStemmer().stem(word)
    snowball_stemmer = SnowballStemmer("english")
    print "snowball stemmer=>", snowball_stemmer.stem(word)
    print "WordNet Lemmatizer=>", WordNetLemmatizer().lemmatize(word)

This is the output I get:

Original Word => arrival
porter stemmer=> arriv
snowball stemmer=> arriv
WordNet Lemmatizer=> arrival


Original Word => conclusion
porter stemmer=> conclus
snowball stemmer=> conclus
WordNet Lemmatizer=> conclusion


Original Word => ate
porter stemmer=> ate
snowball stemmer=> ate
WordNet Lemmatizer=> ate

but I want this output

    Input : arrival
    Output: arrive

    Input : conclusion
    Output: conclude

    Input : ate
    Output: eat

How can I achieve this? Are there any tools already available for this? This is called as morphological analysis. I am aware of that, but there must be some tools which are already achieving this. Help is appreciated :)

First Edit

I tried this code

import nltk
from nltk.stem.wordnet import WordNetLemmatizer
from nltk.tokenize import word_tokenize
from nltk.corpus import wordnet as wn

query = "The Indian economy is the worlds tenth largest by nominal GDP and third largest by purchasing power parity"

def is_noun(tag):
    return tag in ['NN', 'NNS', 'NNP', 'NNPS']

def is_verb(tag):
    return tag in ['VB', 'VBD', 'VBG', 'VBN', 'VBP', 'VBZ']

def is_adverb(tag):
    return tag in ['RB', 'RBR', 'RBS']

def is_adjective(tag):
    return tag in ['JJ', 'JJR', 'JJS']

def penn_to_wn(tag):
    if is_adjective(tag):
        return wn.ADJ
    elif is_noun(tag):
        return wn.NOUN
    elif is_adverb(tag):
        return wn.ADV
    elif is_verb(tag):
        return wn.VERB
    return wn.NOUN

tags = nltk.pos_tag(word_tokenize(query))
for tag in tags:
    wn_tag = penn_to_wn(tag[1])
    print tag[0]+"---> "+WordNetLemmatizer().lemmatize(tag[0],wn_tag)

Here, I tried to use wordnet lemmatizer by providing proper tags. Here is the output:

The---> The
Indian---> Indian
economy---> economy
is---> be
the---> the
worlds---> world
tenth---> tenth
largest---> large
by---> by
nominal---> nominal
GDP---> GDP
and---> and
third---> third
largest---> large
by---> by
purchasing---> purchase
power---> power
parity---> parity

Still, words like "arrival" and "conclusion" wont get processed with this approach. Is there any solution for this?

314

asked Nov 07 '14 07:11

Gunjan

1 Answers

Ok, so... for the word "ate" I think you're looking for NodeBox::Linguistics.

print en.verb.present("gave")
>>> give

And I did not completely understand why do you want the verb or "arrival" but not the one of "conclusion".

answered Sep 20 '22 22:09

Lior

Related questions
                            
                                Why is numpy.random.binomial(1, nan) = -9223372036854775807?
                            
                                Different behaviour of hexbin and histogram2d
                            
                                Using django-dynamic-formset with CreateWithInlinesView from django-extra-views - multiple formsets
                            
                                Is there way to check feature deprecation against django version?
                            
                                Django, ajax populate form with model data
                            
                                Pandas to D3. Serializing dataframes to JSON
                            
                                python / django - bidi brackets issue in html select list
                            
                                Can I make Django QueryDict preserve ordering?
                            
                                Is there a simple way to add a border to Kivy Labels, Buttons, Widgets etc. with-out images?
                            
                                How to write utf8 to standard output in a way that works with python2 and python3
                            
                                Python PIL: Blend transparent image onto another
                            
                                Python Flask get json data to display
                            
                                Numpy View Reshape Without Copy (2d Moving/Sliding Window, Strides, Masked Memory Structures)
                            
                                Mapping from a node's name to its index and vice versa in networkx
                            
                                Table(Model) Inheritance with Flask SQLAlchemy
                            
                                How to retrieve function call argument values using libclang
                            
                                Why does Fraction use __new__ instead of __init__?
                            
                                Pandas, groupby and finding maximum in groups, returning value and count
                            
                                Scikit-learn custom score function needs values from dataset other than X and y
                            
                                NOT NULL constraint failed error

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Getting the basic form of the english word

Tags:

python

text-processing

nlp

stemming

morphological-analysis

Gunjan

People also ask

1 Answers

Lior

Recent Activity

Donate For Us