Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Convert words between verb/noun/adjective forms

i would like a python library function that translates/converts across different parts of speech. sometimes it should output multiple words (e.g. "coder" and "code" are both nouns from the verb "to code", one's the subject the other's the object)

# :: String => List of String print verbify('writer') # => ['write'] print nounize('written') # => ['writer'] print adjectivate('write') # => ['written'] 

i mostly care about verbs <=> nouns, for a note taking program i want to write. i.e. i can write "caffeine antagonizes A1" or "caffeine is an A1 antagonist" and with some NLP it can figure out they mean the same thing. (i know that's not easy, and that it will take NLP that parses and doesn't just tag, but i want to hack up a prototype).

similar questions ... Converting adjectives and adverbs to their noun forms (this answer only stems down to the root POS. i want to go between POS.)

ps called Conversion in linguistics http://en.wikipedia.org/wiki/Conversion_%28linguistics%29

like image 453
sam boosalis Avatar asked Jan 23 '13 21:01

sam boosalis


People also ask

How do you change a verb to a noun and adjective?

Adding Suffixes The most common suffixes used to create adjectives are -ly, -able, -al, -ous, -ary, -ful, -ic, -ish, -less, -like and -y. For example, turn the noun "danger" into the adjective "dangerous" by adding the suffix -ous.

How do you change a noun verb and verb into a noun?

To change a verb to a noun, first locate the verb, or action word, in the sentence. Then, add a determiner like “the” or “a” before the verb to make it into a noun. Next, rewrite or rearrange the sentence so that it makes sense.


2 Answers

This is more a heuristic approach. I have just coded it so appologies for the style. It uses the derivationally_related_forms() from wordnet. I have implemented nounify. I guess verbify works analogous. From what I've tested works pretty well:

from nltk.corpus import wordnet as wn  def nounify(verb_word):     """ Transform a verb to the closest noun: die -> death """     verb_synsets = wn.synsets(verb_word, pos="v")      # Word not found     if not verb_synsets:         return []      # Get all verb lemmas of the word     verb_lemmas = [l for s in verb_synsets \                    for l in s.lemmas if s.name.split('.')[1] == 'v']      # Get related forms     derivationally_related_forms = [(l, l.derivationally_related_forms()) \                                     for l in    verb_lemmas]      # filter only the nouns     related_noun_lemmas = [l for drf in derivationally_related_forms \                            for l in drf[1] if l.synset.name.split('.')[1] == 'n']      # Extract the words from the lemmas     words = [l.name for l in related_noun_lemmas]     len_words = len(words)      # Build the result in the form of a list containing tuples (word, probability)     result = [(w, float(words.count(w))/len_words) for w in set(words)]     result.sort(key=lambda w: -w[1])      # return all the possibilities sorted by probability     return result 
like image 54
bogs Avatar answered Sep 29 '22 13:09

bogs


Here is a function that is in theory able to convert words between noun/verb/adjective/adverb form that I updated from here (originally written by bogs, I believe) to be compliant with nltk 3.2.5 now that synset.lemmas and sysnset.name are functions.

from nltk.corpus import wordnet as wn  # Just to make it a bit more readable WN_NOUN = 'n' WN_VERB = 'v' WN_ADJECTIVE = 'a' WN_ADJECTIVE_SATELLITE = 's' WN_ADVERB = 'r'   def convert(word, from_pos, to_pos):         """ Transform words given from/to POS tags """      synsets = wn.synsets(word, pos=from_pos)      # Word not found     if not synsets:         return []      # Get all lemmas of the word (consider 'a'and 's' equivalent)     lemmas = []     for s in synsets:         for l in s.lemmas():             if s.name().split('.')[1] == from_pos or from_pos in (WN_ADJECTIVE, WN_ADJECTIVE_SATELLITE) and s.name().split('.')[1] in (WN_ADJECTIVE, WN_ADJECTIVE_SATELLITE):                 lemmas += [l]      # Get related forms     derivationally_related_forms = [(l, l.derivationally_related_forms()) for l in lemmas]      # filter only the desired pos (consider 'a' and 's' equivalent)     related_noun_lemmas = []      for drf in derivationally_related_forms:         for l in drf[1]:             if l.synset().name().split('.')[1] == to_pos or to_pos in (WN_ADJECTIVE, WN_ADJECTIVE_SATELLITE) and l.synset().name().split('.')[1] in (WN_ADJECTIVE, WN_ADJECTIVE_SATELLITE):                 related_noun_lemmas += [l]      # Extract the words from the lemmas     words = [l.name() for l in related_noun_lemmas]     len_words = len(words)      # Build the result in the form of a list containing tuples (word, probability)     result = [(w, float(words.count(w)) / len_words) for w in set(words)]     result.sort(key=lambda w:-w[1])      # return all the possibilities sorted by probability     return result   convert('direct', 'a', 'r') convert('direct', 'a', 'n') convert('quick', 'a', 'r') convert('quickly', 'r', 'a') convert('hunger', 'n', 'v') convert('run', 'v', 'a') convert('tired', 'a', 'r') convert('tired', 'a', 'v') convert('tired', 'a', 'n') convert('tired', 'a', 's') convert('wonder', 'v', 'n') convert('wonder', 'n', 'a') 

As you can see below, it doesn't work so great. It's unable to switch between adjective and adverb form (my specific goal), but it does give some interesting results in other cases.

>>> convert('direct', 'a', 'r') [] >>> convert('direct', 'a', 'n') [('directness', 0.6666666666666666), ('line', 0.3333333333333333)] >>> convert('quick', 'a', 'r') [] >>> convert('quickly', 'r', 'a') [] >>> convert('hunger', 'n', 'v') [('hunger', 0.75), ('thirst', 0.25)] >>> convert('run', 'v', 'a') [('persistent', 0.16666666666666666), ('executive', 0.16666666666666666), ('operative', 0.16666666666666666), ('prevalent', 0.16666666666666666), ('meltable', 0.16666666666666666), ('operant', 0.16666666666666666)] >>> convert('tired', 'a', 'r') [] >>> convert('tired', 'a', 'v') [] >>> convert('tired', 'a', 'n') [('triteness', 0.25), ('banality', 0.25), ('tiredness', 0.25), ('commonplace', 0.25)] >>> convert('tired', 'a', 's') [] >>> convert('wonder', 'v', 'n') [('wonder', 0.3333333333333333), ('wonderer', 0.2222222222222222), ('marveller', 0.1111111111111111), ('marvel', 0.1111111111111111), ('wonderment', 0.1111111111111111), ('question', 0.1111111111111111)] >>> convert('wonder', 'n', 'a') [('curious', 0.4), ('wondrous', 0.2), ('marvelous', 0.2), ('marvellous', 0.2)] 

hope this is able to save someone a little trouble

like image 24
stuart Avatar answered Sep 29 '22 12:09

stuart