Similarity in Spacy

Tags:

I am trying to understand how similarity in Spacy works. I tried using Melania Trump's speech and Michelle Obama's speech to see how similar they were.

This is my code.

import spacy
nlp = spacy.load('en_core_web_lg')

file1 = open("melania.txt").read().decode('ascii', 'ignore')
file2 = open("michelle.txt").read().decode('ascii', 'ignore')

doc1 = nlp(unicode(file1))
doc2 = nlp(unicode(file2))
print doc1.similarity(doc2)

I get the similarity score as 0.9951584208511974. This similarity score looks very high to me. Is this correct? Am I doing something wrong?

683

asked Nov 23 '18 22:11

thehydrogen

1 Answers

By default spaCy calculates cosine similarity. Similarity is determined by comparing word vectors or word embeddings, multi-dimensional meaning representations of a word.

It returns return (numpy.dot(self.vector, other.vector) / (self_norm * other_norm))

text1 = 'How can I end violence?'
text2 = 'What should I do to be a peaceful?'
doc1 = nlp(text1)
doc2 = nlp(text2)
print("spaCy :", doc1.similarity(doc2))

print(np.dot(doc1.vector, doc2.vector) / (np.linalg.norm(doc1.vector) * np.linalg.norm(doc2.vector)))

Output:

spaCy : 0.916553147896471
0.9165532

It seems that spaCy's .vector method created the vectors. Documentation says that spaCy's models are trained from GloVe's vectors.

answered Sep 23 '22 17:09

Srce Cde

Related questions
                            
                                finding noun and verb in stanford parser
                            
                                1 million sentences to save in DB - removing non-relevant English words
                            
                                Fuzzy Group By, Grouping Similar Words
                            
                                why Wordnet dictionary doesn't contain the word 'she'?
                            
                                How to rank features by their importance in a Weka classifier?
                            
                                Getting the root word using the Wordnet Lemmatizer
                            
                                WordNet Python words similarity
                            
                                Why can't I import functions in bert after pip install bert
                            
                                Stemming - code examples or open source projects?
                            
                                how to create exclamations for a particular sentence
                            
                                Using Sentiwordnet 3.0
                            
                                Getting adjective from an adverb in nltk or other NLP library
                            
                                Difference between Python's collections.Counter and nltk.probability.FreqDist
                            
                                Find the similarity between two string columns of a DataFrame
                            
                                Natural language date parser for ruby/rails
                            
                                How to include words as numerical feature in classification
                            
                                How can I split at word boundaries with regexes?
                            
                                Compare similarity between names
                            
                                How do I do use non-integer string labels with SVM from scikit-learn? Python
                            
                                How to filter tokens from spaCy document

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Similarity in Spacy

Tags:

nlp

similarity

spacy

thehydrogen

People also ask

1 Answers

Srce Cde

Recent Activity

Donate For Us