Python Spacy beginner : similarities function

Question

In the tutorial example of spaCy in Python the results of apples.similarity(oranges) is 0.39289959293092641 instead of 0.7857989796519943

Any reasons for that? Original docs of the tutorial https://spacy.io/docs/ A tutorial with a different answer to the one I get: http://textminingonline.com/getting-started-with-spacy

Thanks

Ethan · Accepted Answer

That appears to be a bug in spacy.

Somehow vector_norm is incorrectly calculated.

import spacy
import numpy as np
nlp = spacy.load("en")
# using u"apples" just as an example
apples = nlp.vocab[u"apples"]
print apples.vector_norm
# prints 1.4142135381698608, or sqrt(2)
print np.sqrt(np.dot(apples.vector, apples.vector))
# prints 1.0

Then vector_norm is used in similarity, which always returns a value that is always half of the correct value.

def similarity(self, other):
    if self.vector_norm == 0 or other.vector_norm == 0:
        return 0.0
    return numpy.dot(self.vector, other.vector) / (self.vector_norm * other.vector_norm)

If you are ranking similarity scores for synonyms, this might be OK. But if you need the correct cosine similarity score, then the result is incorrect.

I submitted the issue here. Hopefully it will get fixed soon.

Python Spacy beginner : similarities function

Tags:

python

nlp

spacy

aiedu

1 Answers

Ethan

Recent Activity

Donate For Us

Python Spacy beginner : similarities function

Tags:

python

nlp

spacy

aiedu

1 Answers

Ethan

Related questions

Recent Activity

Donate For Us