Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Python Spacy beginner : similarities function

Tags:

python

nlp

spacy

In the tutorial example of spaCy in Python the results of apples.similarity(oranges) is 0.39289959293092641 instead of 0.7857989796519943

Any reasons for that? Original docs of the tutorial https://spacy.io/docs/ A tutorial with a different answer to the one I get: http://textminingonline.com/getting-started-with-spacy

Thanks

like image 795
aiedu Avatar asked Jun 17 '26 02:06

aiedu


1 Answers

That appears to be a bug in spacy.

Somehow vector_norm is incorrectly calculated.

import spacy
import numpy as np
nlp = spacy.load("en")
# using u"apples" just as an example
apples = nlp.vocab[u"apples"]
print apples.vector_norm
# prints 1.4142135381698608, or sqrt(2)
print np.sqrt(np.dot(apples.vector, apples.vector))
# prints 1.0

Then vector_norm is used in similarity, which always returns a value that is always half of the correct value.

def similarity(self, other):
    if self.vector_norm == 0 or other.vector_norm == 0:
        return 0.0
    return numpy.dot(self.vector, other.vector) / (self.vector_norm * other.vector_norm)

If you are ranking similarity scores for synonyms, this might be OK. But if you need the correct cosine similarity score, then the result is incorrect.

I submitted the issue here. Hopefully it will get fixed soon.

like image 100
Ethan Avatar answered Jun 19 '26 16:06

Ethan



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!