Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Getting the basic form of the english word

I am trying to get the basic english word for an english word which is modified from its base form. This question had been asked here, but I didnt see a proper answer, so I am trying to put it this way. I tried 2 stemmers and one lemmatizer from NLTK package which are porter stemmer, snowball stemmer, and wordnet lemmatiser.

I tried this code:

from nltk.stem.porter import PorterStemmer
from nltk.stem.snowball import SnowballStemmer
from nltk.stem.wordnet import WordNetLemmatizer

words = ['arrival','conclusion','ate']

for word in words:
    print "\n\nOriginal Word =>", word
    print "porter stemmer=>", PorterStemmer().stem(word)
    snowball_stemmer = SnowballStemmer("english")
    print "snowball stemmer=>", snowball_stemmer.stem(word)
    print "WordNet Lemmatizer=>", WordNetLemmatizer().lemmatize(word)

This is the output I get:

Original Word => arrival
porter stemmer=> arriv
snowball stemmer=> arriv
WordNet Lemmatizer=> arrival


Original Word => conclusion
porter stemmer=> conclus
snowball stemmer=> conclus
WordNet Lemmatizer=> conclusion


Original Word => ate
porter stemmer=> ate
snowball stemmer=> ate
WordNet Lemmatizer=> ate

but I want this output

    Input : arrival
    Output: arrive

    Input : conclusion
    Output: conclude

    Input : ate
    Output: eat 

How can I achieve this? Are there any tools already available for this? This is called as morphological analysis. I am aware of that, but there must be some tools which are already achieving this. Help is appreciated :)

First Edit

I tried this code

import nltk
from nltk.stem.wordnet import WordNetLemmatizer
from nltk.tokenize import word_tokenize
from nltk.corpus import wordnet as wn

query = "The Indian economy is the worlds tenth largest by nominal GDP and third largest by purchasing power parity"

def is_noun(tag):
    return tag in ['NN', 'NNS', 'NNP', 'NNPS']

def is_verb(tag):
    return tag in ['VB', 'VBD', 'VBG', 'VBN', 'VBP', 'VBZ']

def is_adverb(tag):
    return tag in ['RB', 'RBR', 'RBS']

def is_adjective(tag):
    return tag in ['JJ', 'JJR', 'JJS']

def penn_to_wn(tag):
    if is_adjective(tag):
        return wn.ADJ
    elif is_noun(tag):
        return wn.NOUN
    elif is_adverb(tag):
        return wn.ADV
    elif is_verb(tag):
        return wn.VERB
    return wn.NOUN

tags = nltk.pos_tag(word_tokenize(query))
for tag in tags:
    wn_tag = penn_to_wn(tag[1])
    print tag[0]+"---> "+WordNetLemmatizer().lemmatize(tag[0],wn_tag)

Here, I tried to use wordnet lemmatizer by providing proper tags. Here is the output:

The---> The
Indian---> Indian
economy---> economy
is---> be
the---> the
worlds---> world
tenth---> tenth
largest---> large
by---> by
nominal---> nominal
GDP---> GDP
and---> and
third---> third
largest---> large
by---> by
purchasing---> purchase
power---> power
parity---> parity

Still, words like "arrival" and "conclusion" wont get processed with this approach. Is there any solution for this?

like image 314
Gunjan Avatar asked Nov 07 '14 07:11

Gunjan


People also ask

What forms the basis of English words?

In English grammar, a base is the form of a word to which prefixes and suffixes can be added to create new words. For example, instruct is the base for forming instruction, instructor, and reinstruct. Also called a root or stem. Put another way, base forms are words that are not derived from or made up of other words.

What is basic form in English?

basic form (plural basic forms) The uninflected form of a word used as a dictionary entry.

What do you call the basic form of a word?

Roots/Base words are morphemes that form the base of a word, and usually carry its meaning.

How do you find the base of a word?

A base word is a word that can have a prefix or a suffix added to it. When a prefix or suffix is added to a base word, the word's meaning changes and a new word is formed. A prefix is added to the beginning of a base word.


1 Answers

Ok, so... for the word "ate" I think you're looking for NodeBox::Linguistics.

print en.verb.present("gave")
>>> give

And I did not completely understand why do you want the verb or "arrival" but not the one of "conclusion".

like image 94
Lior Avatar answered Sep 20 '22 22:09

Lior