Check the similarity between two words with NLTK with Python

Tags:

I have a two lists and I want to check the similarity between each words in the two list and find out the maximum similarity.Here is my code,

from nltk.corpus import wordnet

list1 = ['Compare', 'require']
list2 = ['choose', 'copy', 'define', 'duplicate', 'find', 'how', 'identify', 'label', 'list', 'listen', 'locate', 'match', 'memorise', 'name', 'observe', 'omit', 'quote', 'read', 'recall', 'recite', 'recognise', 'record', 'relate', 'remember', 'repeat', 'reproduce', 'retell', 'select', 'show', 'spell', 'state', 'tell', 'trace', 'write']
list = []

for word1 in list1:
    for word2 in list2:
        wordFromList1 = wordnet.synsets(word1)[0]
        wordFromList2 = wordnet.synsets(word2)[0]
        s = wordFromList1.wup_similarity(wordFromList2)
        list.append(s)

print(max(list))

But this will result an error:

wordFromList2 = wordnet.synsets(word2)[0]
        IndexError: list index out of range

Please help me to fix this.
Thanking you

787

asked Jun 14 '15 12:06

Punuth

2 Answers

You're getting an error if a synset list is empty, and you try to get the element at (non-existent) index zero. But why only check the zero'th element? If you want to check everything, try all pairs of elements in the returned synsets. You can use itertools.product() to save yourself two for-loops:

from itertools import product
sims = []

for word1, word2 in product(list1, list2):
    syns1 = wordnet.synsets(word1)
    syns2 = wordnet.synsets(word2)
    for sense1, sense2 in product(syns1, syns2):
        d = wordnet.wup_similarity(sense1, sense2)
        sims.append((d, syns1, syns2))

This is inefficient because the same synsets are looked up again and again, but it is the closest to the logic of your code. If you have enough data to make speed an issue, you can speed it up by collecting the synsets for all words in list1 and list2 once, and taking the product of the synsets.

>>> allsyns1 = set(ss for word in list1 for ss in wordnet.synsets(word))
>>> allsyns2 = set(ss for word in list2 for ss in wordnet.synsets(word))
>>> best = max((wordnet.wup_similarity(s1, s2) or 0, s1, s2) for s1, s2 in 
        product(allsyns1, allsyns2))
>>> print(best)
(0.9411764705882353, Synset('command.v.02'), Synset('order.v.01'))

168

answered Oct 12 '22 22:10

alexis

Try checking whether these lists are empty before you use then:

from nltk.corpus import wordnet

list1 = ['Compare', 'require']
list2 = ['choose', 'copy', 'define', 'duplicate', 'find', 'how', 'identify', 'label', 'list', 'listen', 'locate', 'match', 'memorise', 'name', 'observe', 'omit', 'quote', 'read', 'recall', 'recite', 'recognise', 'record', 'relate', 'remember', 'repeat', 'reproduce', 'retell', 'select', 'show', 'spell', 'state', 'tell', 'trace', 'write']
list = []

for word1 in list1:
    for word2 in list2:
        wordFromList1 = wordnet.synsets(word1)
        wordFromList2 = wordnet.synsets(word2)
        if wordFromList1 and wordFromList2: #Thanks to @alexis' note
            s = wordFromList1[0].wup_similarity(wordFromList2[0])
            list.append(s)

print(max(list))

answered Oct 12 '22 22:10

omerbp

Related questions
                            
                                What format does "adb screencap /sdcard/screenshot.raw" produce? (without "-p" flag)
                            
                                Regex to find words between two tags
                            
                                "Unable to bind localhost:8000" error while running sample application in google app engine
                            
                                Iterate through files
                            
                                "RuntimeError: working outside of application context" when unit testing with py.test
                            
                                Formatting thousand separator for integers in a pandas dataframe
                            
                                Django: Attempt to write a read-only database
                            
                                Generating allure report using pytest
                            
                                How to reverse a bitwise OR operation?
                            
                                Python pandas DataFrame from first and last row of csv
                            
                                Matplotlib FuncAnimation for scatter plot
                            
                                cursor() raise errors.OperationalError("MySQL Connection not available.") OperationalError: MySQL Connection not available
                            
                                Is __init__ a class method?
                            
                                How to compare two lists in python
                            
                                Extract text after specific character
                            
                                How to check if two instances are of the same class Python
                            
                                How to get the widget's current x and y coordinates?
                            
                                Using a checkbox in pyqt
                            
                                Python Command Line Arguments: Calling a function
                            
                                Rolling back to a previous migration in django

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Check the similarity between two words with NLTK with Python

Tags:

python

similarity

nltk

Punuth

People also ask

2 Answers

alexis

omerbp

Recent Activity

Donate For Us