Calculate cosine similarity given 2 sentence strings

Tags:

From Python: tf-idf-cosine: to find document similarity , it is possible to calculate document similarity using tf-idf cosine. Without importing external libraries, are that any ways to calculate cosine similarity between 2 strings?

s1 = "This is a foo bar sentence ." s2 = "This sentence is similar to a foo bar sentence ." s3 = "What is this string ? Totally not related to the other two lines ."  cosine_sim(s1, s2) # Should give high cosine similarity cosine_sim(s1, s3) # Shouldn't give high cosine similarity value cosine_sim(s2, s3) # Shouldn't give high cosine similarity value

362

asked Mar 02 '13 10:03

alvas

1 Answers

A simple pure-Python implementation would be:

import math import re from collections import Counter  WORD = re.compile(r"\w+")   def get_cosine(vec1, vec2):     intersection = set(vec1.keys()) & set(vec2.keys())     numerator = sum([vec1[x] * vec2[x] for x in intersection])      sum1 = sum([vec1[x] ** 2 for x in list(vec1.keys())])     sum2 = sum([vec2[x] ** 2 for x in list(vec2.keys())])     denominator = math.sqrt(sum1) * math.sqrt(sum2)      if not denominator:         return 0.0     else:         return float(numerator) / denominator   def text_to_vector(text):     words = WORD.findall(text)     return Counter(words)   text1 = "This is a foo bar sentence ." text2 = "This sentence is similar to a foo bar sentence ."  vector1 = text_to_vector(text1) vector2 = text_to_vector(text2)  cosine = get_cosine(vector1, vector2)  print("Cosine:", cosine)

Prints:

Cosine: 0.861640436855

The cosine formula used here is described here.

This does not include weighting of the words by tf-idf, but in order to use tf-idf, you need to have a reasonably large corpus from which to estimate tfidf weights.

You can also develop it further, by using a more sophisticated way to extract words from a piece of text, stem or lemmatise it, etc.

141

answered Nov 11 '22 04:11

vpekar

Related questions
                            
                                Does Python evaluate if's conditions lazily? [duplicate]
                            
                                Groupby value counts on the dataframe pandas
                            
                                How to remove default example dags in airflow
                            
                                how to use python2.7 pip instead of default pip
                            
                                Python Database connection Close
                            
                                Simple syntax for bringing a list element to the front in python? [duplicate]
                            
                                Looping through python regex matches
                            
                                Python: changing methods and attributes at runtime
                            
                                "isnotnan" functionality in numpy, can this be more pythonic?
                            
                                How to set virtualenv for a crontab?
                            
                                How to build a flask application around an already existing database?
                            
                                Python Library Path
                            
                                Celery - Get task id for current task
                            
                                Python: Print a variable's name and value?
                            
                                Django ModelForm to have a hidden input
                            
                                How to remove duplicates from Python list and keep order? [duplicate]
                            
                                How to call Python functions dynamically
                            
                                Prepend a line to an existing file in Python
                            
                                How do I create documentation with Pydoc?
                            
                                How to get the first word in the string

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Calculate cosine similarity given 2 sentence strings

Tags:

python

string

nlp

similarity

cosine-similarity

alvas

People also ask

1 Answers

vpekar

Recent Activity

Donate For Us