Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to get the wordnet sense frequency of a synset in NLTK?

According to the documentation i can load a sense tagged corpus in nltk as such:

>>> from nltk.corpus import wordnet_ic
>>> brown_ic = wordnet_ic.ic('ic-brown.dat')
>>> semcor_ic = wordnet_ic.ic('ic-semcor.dat')

I can also get the definition, pos, offset, examples as such:

>>> wn.synset('dog.n.01').examples
>>> wn.synset('dog.n.01').definition

But how can get the frequency of a synset from a corpus? To break down the question:

  1. first how to count many times did a synset occurs a sense-tagged corpus?
  2. then the next step is to divide by the the count by the total number of counts for all synsets occurrences given the particular lemma.
like image 429
alvas Avatar asked Mar 21 '13 15:03

alvas


1 Answers

I managed to do it this way.

from nltk.corpus import wordnet as wn

word = "dog"
synsets = wn.synsets(word)

sense2freq = {}
for s in synsets:
  freq = 0  
  for lemma in s.lemmas:
    freq+=lemma.count()
  sense2freq[s.offset+"-"+s.pos] = freq

for s in sense2freq:
  print s, sense2freq[s]
like image 113
alvas Avatar answered Nov 11 '22 18:11

alvas