How can I print the entire contents of Wordnet (preferably with NLTK)?

Tags:

NLTK provides functions for printing all the words in the Brown (or Gutenberg) corpus. But the equivalent function does not seem to work on Wordnet.

Is there a way to do this through NLTK? If there is not, how might one do it?

This works:

from nltk.corpus import brown as b
print b.words()

This causes an AttributeError:

from nltk.corpus import wordnet as wn
print wn.words()

557

asked Nov 05 '15 03:11

zadrozny

2 Answers

For wordnet, it's a word sense resources so elements in the resource are indexed by senses (aka synsets).

To iterate through synsets:

>>> from nltk.corpus import wordnet as wn
>>> for ss in wn.all_synsets():
...     print ss
...     print ss.definition()
...     break
... 
Synset('able.a.01')
(usually followed by `to') having the necessary means or skill or know-how or authority to do something

For each synset (sense/concept), there is a list of words attached to it, called lemmas: lemmas are the canonical ("root") form of the words we use to when we check a dictionary.

To get a full list of lemmas in wordnet using a one-liner:

>>> lemmas_in_wordnet = set(chain(*[ss.lemma_names() for ss in wn.all_synsets()]))

Interestingly, wn.words() will also return all the lemma_names:

>>> lemmas_in_words  = set(i for i in wn.words())
>>> len(lemmas_in_wordnet)
148730
>>> len(lemmas_in_words)
147306

But strangely there're some discrepancies as to the total number of words collected using wn.words().

"Printing the full content" of wordnet into text seems to be something too ambitious, because wordnet is structured sort of like a hierarchical graph, with synsets interconnected to each other and each synset has its own properties/attributes. That's why the wordnet files are not kept simply as a single textfile.

To see what a synset contains:

>>> first_synset = next(wn.all_synsets())
>>> dir(first_synset)
['__class__', '__delattr__', '__dict__', '__doc__', '__eq__', '__format__', '__ge__', '__getattribute__', '__gt__', '__hash__', '__init__', '__le__', '__lt__', '__module__', '__ne__', '__new__', '__reduce__', '__reduce_ex__', '__repr__', '__setattr__', '__sizeof__', '__slots__', '__str__', '__subclasshook__', '__unicode__', '__weakref__', '_all_hypernyms', '_definition', '_examples', '_frame_ids', '_hypernyms', '_instance_hypernyms', '_iter_hypernym_lists', '_lemma_names', '_lemma_pointers', '_lemmas', '_lexname', '_max_depth', '_min_depth', '_name', '_needs_root', '_offset', '_pointers', '_pos', '_related', '_shortest_hypernym_paths', '_wordnet_corpus_reader', 'also_sees', 'attributes', 'causes', 'closure', 'common_hypernyms', 'definition', 'entailments', 'examples', 'frame_ids', 'hypernym_distances', 'hypernym_paths', 'hypernyms', 'hyponyms', 'instance_hypernyms', 'instance_hyponyms', 'jcn_similarity', 'lch_similarity', 'lemma_names', 'lemmas', 'lexname', 'lin_similarity', 'lowest_common_hypernyms', 'max_depth', 'member_holonyms', 'member_meronyms', 'min_depth', 'name', 'offset', 'part_holonyms', 'part_meronyms', 'path_similarity', 'pos', 'region_domains', 'res_similarity', 'root_hypernyms', 'shortest_path_distance', 'similar_tos', 'substance_holonyms', 'substance_meronyms', 'topic_domains', 'tree', 'unicode_repr', 'usage_domains', 'verb_groups', 'wup_similarity']

Going through this howto would be helpful in knowing how to access the information you need in wordnet: http://www.nltk.org/howto/wordnet.html

answered Oct 05 '22 22:10

alvas

This will generate an output of synonyms of all words in synset:

from nltk.corpus import wordnet as wn
synonyms=[]
for word in wn.words():
    print (word,end=":")
    for syn in wn.synsets(word):
      for l in syn.lemmas():
        synonyms.append(l.name())
    print(set(synonyms),end="\n")
    synonyms.clear()

answered Oct 05 '22 22:10

Raveena

Related questions
                            
                                Interpreting scipy.stats.entropy values
                            
                                ttk.Treeview - Can't change row height
                            
                                Python: ImportError: /usr/local/lib/python2.7/lib-dynload/_io.so: undefined symbol: PyUnicodeUCS2_Replace
                            
                                In Python, why does a negative number raised to an even power remain negative? [duplicate]
                            
                                Using WN-Affect to detect emotion/mood of a string
                            
                                Maybe monad in Python with method chaining
                            
                                Django UnitTest with Mock
                            
                                Run python behave from python instead of command line
                            
                                How to generate a valid sample token with stripe?
                            
                                How do I configure mathjax for iPython notebooks?
                            
                                Numpy: Filtering rows by multiple conditions?
                            
                                How to verify a JWT using python PyJWT with a public PEM cert?
                            
                                How to add a screenshot to allure report with python?
                            
                                Continue until all iterators are done Python
                            
                                numpy: fill offset diagonal with different values
                            
                                Concatenate several np arrays in python
                            
                                Iterating through a unicode string in Python
                            
                                Scrapy - No module named mail.smtp
                            
                                Python integer formatting
                            
                                python bin data and return bin midpoint (maybe using pandas.cut and qcut)

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

How can I print the entire contents of Wordnet (preferably with NLTK)?

Tags:

python

nlp

nltk

wordnet

corpus

zadrozny

People also ask

2 Answers

alvas

Raveena

Recent Activity

Donate For Us