WordNet Python words similarity

Tags:

I'm trying to find a reliable way to measure the semantic similarity of 2 terms. The first metric could be the path distance on a hyponym/hypernym graph (eventually a linear combination of 2-3 metrics could be better..).

from nltk.corpus import wordnet as wn
dog = wn.synset('dog.n.01')
cat = wn.synset('cat.n.01')
print(dog.path_similarity(cat))

I still don't get what n.01 means and why it's necessary.
there is a way to visually show the computed path between 2 terms?
Which other nltk semantic metric could I use?

296

asked Jan 22 '17 17:01

alfredopacino

1 Answers

1. I still don't get what n.01 means and why it's necessary.

from here and the source of nltk shows that the result is "WORD.PART-OF-SPEECH.SENSE-NUMBER"

quoting the source:

Create a Lemma from a "<word>.<pos>.<number>.<lemma>" string where:
<word> is the morphological stem identifying the synset
<pos> is one of the module attributes ADJ, ADJ_SAT, ADV, NOUN or VERB
<number> is the sense number, counting from 0.
<lemma> is the morphological form of interest

n means Noun, I also suggest reading about wordnet dataset.

2. there is a way to visually show the computed path between 2 terms?

please look at the nltk wordnet docs on similarity section. you have several choices for path algorithms there (you can try mixing several).

few examples from nltk docs:

from nltk.corpus import wordnet as wn
dog = wn.synset('dog.n.01')
cat = wn.synset('cat.n.01')

print(dog.path_similarity(cat))
print(dog.lch_similarity(cat))
print(dog.wup_similarity(cat))

for the visualization you can build a distance matrix M[i,j] where:

M[i,j] = word_similarity(i, j)

and use the following stackoverflow answer to draw the visualization.

3. Which other nltk semantic metric could I use?

As mentioned above, there are several ways to calculate the word similarities. I also suggest looking into gensim. I used its word2vec implementation for word similarities and it worked well for me.

if you need any help choosing algorithms please provide more info about the problem you are facing.

Update:

More info about word sense number meaning can be found here:

Senses in WordNet are generally ordered from most to least frequently used, with the most common sense numbered 1...

the problem is that "dog" is ambiguous and you must choose the right meaning for it.

you might choose the first sense as naive approach or find your own algorithm for choosing the right meaning depend on your application or research.

to get all available definitions (called synsets on wordnet docs) of a word from wordnet you could simply call wn.synsets(word).

I encourage you to dig into the metadata contained inside these synset for each definition.

the code below shows a simple example to get this metadata and prints it nicely.

from nltk.corpus import wordnet as wn

dog_synsets = wn.synsets('dog')

for i, syn in enumerate(dog_synsets):
    print('%d. %s' % (i, syn.name()))
    print('alternative names (lemmas): "%s"' % '", "'.join(syn.lemma_names()))
    print('definition: "%s"' % syn.definition())
    if syn.examples():
        print('example usage: "%s"' % '", "'.join(syn.examples()))
    print('\n')

code output:

0. dog.n.01
alternative names (lemmas): "dog", "domestic_dog", "Canis_familiaris"
definition: "a member of the genus Canis (probably descended from the common wolf) that has been domesticated by man since prehistoric times; occurs in many breeds"
example usage: "the dog barked all night"


1. frump.n.01
alternative names (lemmas): "frump", "dog"
definition: "a dull unattractive unpleasant girl or woman"
example usage: "she got a reputation as a frump", "she's a real dog"


2. dog.n.03
alternative names (lemmas): "dog"
definition: "informal term for a man"
example usage: "you lucky dog"


3. cad.n.01
alternative names (lemmas): "cad", "bounder", "blackguard", "dog", "hound", "heel"
definition: "someone who is morally reprehensible"
example usage: "you dirty dog"


4. frank.n.02
alternative names (lemmas): "frank", "frankfurter", "hotdog", "hot_dog", "dog", "wiener", "wienerwurst", "weenie"
definition: "a smooth-textured sausage of minced beef or pork usually smoked; often served on a bread roll"


5. pawl.n.01
alternative names (lemmas): "pawl", "detent", "click", "dog"
definition: "a hinged catch that fits into a notch of a ratchet to move a wheel forward or prevent it from moving backward"


6. andiron.n.01
alternative names (lemmas): "andiron", "firedog", "dog", "dog-iron"
definition: "metal supports for logs in a fireplace"
example usage: "the andirons were too hot to touch"


7. chase.v.01
alternative names (lemmas): "chase", "chase_after", "trail", "tail", "tag", "give_chase", "dog", "go_after", "track"
definition: "go after with the intent to catch"
example usage: "The policeman chased the mugger down the alley", "the dog chased the rabbit"

189

answered Sep 19 '22 18:09

ShmulikA

Related questions
                            
                                Updating an ManyToMany field with Django rest
                            
                                How to do Byte Pair Encoding bigram counting and replacements efficiently in Python?
                            
                                How to make spaces and indentation insignificant in Django blocktrans?
                            
                                Appengine remote_api_shell not working with application-default credentials since update
                            
                                Minute and second format for x label of matplotlib
                            
                                MemoryError when loading a JSON file
                            
                                Why is 211 used in plt.subplot(211)
                            
                                Sklearn predict multiple outputs
                            
                                RobotFramework Create Dictionary with an integer value instead of string
                            
                                Writing To CSV file Without Line Space in Python 3
                            
                                Falcon CORS middleware does not work properly
                            
                                How to get the globals from a module namespace?
                            
                                Tensorflow Retrain on Windows
                            
                                How to control the order that after_request handlers are executed?
                            
                                Starting from a specific point in a For loop
                            
                                How to use tf.while_loop() for variable-length inputs in tensorflow?
                            
                                Count characters in a string from a list of characters
                            
                                Returning top n values for group/multiindex in Pandas
                            
                                How to check whether or not a iterating variable NavigableString or Tag type?
                            
                                How to use Scala UDF in PySpark?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

WordNet Python words similarity

Tags:

python

semantics

nlp

nltk

alfredopacino

People also ask

1 Answers

Update:

ShmulikA

Recent Activity

Donate For Us