How do I print out just the word itself in a WordNet synset using Python NLTK?

Question

Is there a way in Python 2.7 using NLTK to just get the word and not the extra formatting that includes "synset" and the parentheses and the "n.01" etc?

For instance if I do

        wn.synsets('dog')

My results look like:

[Synset('dog.n.01'), Synset('frump.n.01'), Synset('dog.n.03'), Synset('cad.n.01'), Synset('frank.n.02'), Synset('pawl.n.01'), Synset('andiron.n.01'), Synset('chase.v.01')]

How can I instead get a list like this?

dog
frump
cad
frank
pawl
andiron
chase

Is there a way to do this using NLTK or do I have to use regular expressions? Can I use regular expressions within a python script?

Frank Riccobono · Accepted Answer

If you want to do this without regular expressions, you can use a list comprehension.

[synset.name.split('.')[0] for synset in wn.synsets('dog') ]

What you're doing here is saying that, for each synset return the first word before the period.

user3776949 · Answer

Try this:

for synset in wn.synsets('dog'):
    print synset.lemmas[0].name

You want to iterate over each synset for dog, and then print out the headword of the synset. Keep in mind that multiple words could attach to the same synset, so if you want to get all the words associated with all the synsets for dog, you could do:

for synset in wn.synsets('dog'):
    for lemma in synset.lemmas:
        print lemma.name

How do I print out just the word itself in a WordNet synset using Python NLTK?

Tags:

python

regex

nltk

wordnet

TheFishes

2 Answers

Frank Riccobono

user3776949

Recent Activity

Donate For Us

How do I print out just the word itself in a WordNet synset using Python NLTK?

Tags:

python

regex

nltk

wordnet

TheFishes

2 Answers

Frank Riccobono

user3776949

Related questions

Recent Activity

Donate For Us