Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to list all the forms of a word using NLTK in python

I need to list all the forms (verb , noun, comparative, superlative, adjective, and adverb) of a word using NLTK library in python . For example if I have the word "write" the result should be: wrote writing writer written etc..., also if the word can be written in comparative and superlative form e.g; cold then colder, coldest. And quick : quickly etc. Is there a way to do that?

like image 472
ANjell Avatar asked Jan 09 '15 02:01

ANjell


People also ask

How do you tag words in NLTK?

POS Tagging in NLTK is a process to mark up the words in text format for a particular part of a speech based on its definition and context. Some NLTK POS tagging examples are: CC, CD, EX, JJ, MD, NNP, PDT, PRP$, TO, etc. POS tagger is used to assign grammatical information of each word of the sentence.

How do you check if a word is a noun in Python?

if (val = = 'NN' or val = = 'NNS' or val = = 'NNPS' or val = = 'NNP' ): print (text, " is a noun." ) else : print (text, " is not a noun." )


1 Answers

Hi this is my late answer. Hope this still help. I just improve it a little and some small debugging to fit new nltk version. The original code can be found in George-Bogdan Ivanov's answer here Convert words between verb/noun/adjective forms

from nltk.corpus import wordnet as wn

def morphify(word,org_pos,target_pos):
    """ morph a word """
    synsets = wn.synsets(word, pos=org_pos)

    # Word not found
    if not synsets:
        return []

    # Get all  lemmas of the word
    lemmas = [l for s in synsets \
                   for l in s.lemmas() if s.name().split('.')[1] == org_pos]

    # Get related forms
    derivationally_related_forms = [(l, l.derivationally_related_forms()) \
                                    for l in    lemmas]

    # filter only the targeted pos
    related_lemmas = [l for drf in derivationally_related_forms \
                           for l in drf[1] if l.synset().name().split('.')[1] == target_pos]

    # Extract the words from the lemmas
    words = [l.name() for l in related_lemmas]
    len_words = len(words)

    # Build the result in the form of a list containing tuples (word, probability)
    result = [(w, float(words.count(w))/len_words) for w in set(words)]
    result.sort(key=lambda w: -w[1])

    # return all the possibilities sorted by probability
    return result

print morphify('sadness','n','v')
like image 69
Duc Anh Avatar answered Nov 23 '22 19:11

Duc Anh