i would like a python library function that translates/converts across different parts of speech. sometimes it should output multiple words (e.g. "coder" and "code" are both nouns from the verb "to code", one's the subject the other's the object)
# :: String => List of String print verbify('writer') # => ['write'] print nounize('written') # => ['writer'] print adjectivate('write') # => ['written']
i mostly care about verbs <=> nouns, for a note taking program i want to write. i.e. i can write "caffeine antagonizes A1" or "caffeine is an A1 antagonist" and with some NLP it can figure out they mean the same thing. (i know that's not easy, and that it will take NLP that parses and doesn't just tag, but i want to hack up a prototype).
similar questions ... Converting adjectives and adverbs to their noun forms (this answer only stems down to the root POS. i want to go between POS.)
ps called Conversion in linguistics http://en.wikipedia.org/wiki/Conversion_%28linguistics%29
Adding Suffixes The most common suffixes used to create adjectives are -ly, -able, -al, -ous, -ary, -ful, -ic, -ish, -less, -like and -y. For example, turn the noun "danger" into the adjective "dangerous" by adding the suffix -ous.
To change a verb to a noun, first locate the verb, or action word, in the sentence. Then, add a determiner like “the” or “a” before the verb to make it into a noun. Next, rewrite or rearrange the sentence so that it makes sense.
This is more a heuristic approach. I have just coded it so appologies for the style. It uses the derivationally_related_forms() from wordnet. I have implemented nounify. I guess verbify works analogous. From what I've tested works pretty well:
from nltk.corpus import wordnet as wn def nounify(verb_word): """ Transform a verb to the closest noun: die -> death """ verb_synsets = wn.synsets(verb_word, pos="v") # Word not found if not verb_synsets: return [] # Get all verb lemmas of the word verb_lemmas = [l for s in verb_synsets \ for l in s.lemmas if s.name.split('.')[1] == 'v'] # Get related forms derivationally_related_forms = [(l, l.derivationally_related_forms()) \ for l in verb_lemmas] # filter only the nouns related_noun_lemmas = [l for drf in derivationally_related_forms \ for l in drf[1] if l.synset.name.split('.')[1] == 'n'] # Extract the words from the lemmas words = [l.name for l in related_noun_lemmas] len_words = len(words) # Build the result in the form of a list containing tuples (word, probability) result = [(w, float(words.count(w))/len_words) for w in set(words)] result.sort(key=lambda w: -w[1]) # return all the possibilities sorted by probability return result
Here is a function that is in theory able to convert words between noun/verb/adjective/adverb form that I updated from here (originally written by bogs, I believe) to be compliant with nltk 3.2.5 now that synset.lemmas
and sysnset.name
are functions.
from nltk.corpus import wordnet as wn # Just to make it a bit more readable WN_NOUN = 'n' WN_VERB = 'v' WN_ADJECTIVE = 'a' WN_ADJECTIVE_SATELLITE = 's' WN_ADVERB = 'r' def convert(word, from_pos, to_pos): """ Transform words given from/to POS tags """ synsets = wn.synsets(word, pos=from_pos) # Word not found if not synsets: return [] # Get all lemmas of the word (consider 'a'and 's' equivalent) lemmas = [] for s in synsets: for l in s.lemmas(): if s.name().split('.')[1] == from_pos or from_pos in (WN_ADJECTIVE, WN_ADJECTIVE_SATELLITE) and s.name().split('.')[1] in (WN_ADJECTIVE, WN_ADJECTIVE_SATELLITE): lemmas += [l] # Get related forms derivationally_related_forms = [(l, l.derivationally_related_forms()) for l in lemmas] # filter only the desired pos (consider 'a' and 's' equivalent) related_noun_lemmas = [] for drf in derivationally_related_forms: for l in drf[1]: if l.synset().name().split('.')[1] == to_pos or to_pos in (WN_ADJECTIVE, WN_ADJECTIVE_SATELLITE) and l.synset().name().split('.')[1] in (WN_ADJECTIVE, WN_ADJECTIVE_SATELLITE): related_noun_lemmas += [l] # Extract the words from the lemmas words = [l.name() for l in related_noun_lemmas] len_words = len(words) # Build the result in the form of a list containing tuples (word, probability) result = [(w, float(words.count(w)) / len_words) for w in set(words)] result.sort(key=lambda w:-w[1]) # return all the possibilities sorted by probability return result convert('direct', 'a', 'r') convert('direct', 'a', 'n') convert('quick', 'a', 'r') convert('quickly', 'r', 'a') convert('hunger', 'n', 'v') convert('run', 'v', 'a') convert('tired', 'a', 'r') convert('tired', 'a', 'v') convert('tired', 'a', 'n') convert('tired', 'a', 's') convert('wonder', 'v', 'n') convert('wonder', 'n', 'a')
As you can see below, it doesn't work so great. It's unable to switch between adjective and adverb form (my specific goal), but it does give some interesting results in other cases.
>>> convert('direct', 'a', 'r') [] >>> convert('direct', 'a', 'n') [('directness', 0.6666666666666666), ('line', 0.3333333333333333)] >>> convert('quick', 'a', 'r') [] >>> convert('quickly', 'r', 'a') [] >>> convert('hunger', 'n', 'v') [('hunger', 0.75), ('thirst', 0.25)] >>> convert('run', 'v', 'a') [('persistent', 0.16666666666666666), ('executive', 0.16666666666666666), ('operative', 0.16666666666666666), ('prevalent', 0.16666666666666666), ('meltable', 0.16666666666666666), ('operant', 0.16666666666666666)] >>> convert('tired', 'a', 'r') [] >>> convert('tired', 'a', 'v') [] >>> convert('tired', 'a', 'n') [('triteness', 0.25), ('banality', 0.25), ('tiredness', 0.25), ('commonplace', 0.25)] >>> convert('tired', 'a', 's') [] >>> convert('wonder', 'v', 'n') [('wonder', 0.3333333333333333), ('wonderer', 0.2222222222222222), ('marveller', 0.1111111111111111), ('marvel', 0.1111111111111111), ('wonderment', 0.1111111111111111), ('question', 0.1111111111111111)] >>> convert('wonder', 'n', 'a') [('curious', 0.4), ('wondrous', 0.2), ('marvelous', 0.2), ('marvellous', 0.2)]
hope this is able to save someone a little trouble
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With