I know it's possible to return how similar two strings are by using the following function:
from difflib import SequenceMatcher
def similar(a, b):
output=SequenceMatcher(None, a, b).ratio()
return output
In [37]: similar("Hey, this is a test!","Hey, man, this is a test, man.")
Out[37]: 0.76
In [38]: similar("This should be one.","This should be one.")
Out[38]: 1.0
But is it possible to score two dictionaries based on the similarity of keys and their corresponding values? Not a number of in common keys, or what is in common, but a score from 0 to 1, like the example above with strings.
I'm trying to find the similarity score between ratings['Shane'] and ratings['Joe'] in this dictionary:
ratings={'Shane': {'127 Hours': 3.0, 'Avatar': 4.0, 'Nonstop': 5.0}, 'Joe': {'127 Hours': 5.0, 'Taken 3': 4.0, 'Avatar': 5.0, 'Nonstop': 3.0}}
I am using Python 2.7.10
Using == operator to Compare Two Dictionaries Here we are using the equality comparison operator in Python to compare two dictionaries whether both have the same key value pairs or not.
You can use set intersection on the dictionaries keys() . Then loop over those and check if the values corresponding to those keys are identical.
Use == to check equality of two dictionaries Use == to check if two dictionaries contain the same set of key: value pairs.
The simplest technique to check if two or multiple dictionaries are equal is by using the == operator in Python. You can create the dictionaries with any of the methods defined in Python and then compare them using the == operator. It will return True the dictionaries are equals and False if not.
import math
ratings={'Shane': {'127 Hours': 3.0, 'Avatar': 4.0, 'Nonstop': 5.0}, 'Joe': {'127 Hours': 5.0, 'Taken 3': 4.0, 'Avatar': 5.0, 'Nonstop': 3.0}}
def cosine_similarity(vec1,vec2):
sum11, sum12, sum22 = 0, 0, 0
for i in range(len(vec1)):
x = vec1[i]; y = vec2[i]
sum11 += x*x
sum22 += y*y
sum12 += x*y
return sum12/math.sqrt(sum11*sum22)
list1 = list(ratings['Shane'].values())
list2 = list(ratings['Joe'].values())
sim = cosine_similarity(list1,list2)
print(sim)
output
o/p : 0.9205746178983233
Updated When i use :
ratings={'Shane': {'127 Hours': 5.0, 'Avatar': 4.0, 'Nonstop': 5.0},
'Joe': {'127 Hours': 5.0, 'Taken 3': 4.0, 'Avatar': 5.0, 'Nonstop': 3.0}}
output :0.9574271077563381
Update2: Normalized length and considered keys
from math import*
ratings={'Shane': {'127 Hours': 5.0, 'Avatar': 4.0, 'Nonstop': 5.0},
'Joe': {'127 Hours': 5.0, 'Taken 3': 4.0, 'Avatar': 5.0, 'Nonstop': 3.0},
'Bob': {'Panic Room':5.0,'Nonstop':5.0}}
def square_rooted(x):
return round(sqrt(sum([a*a for a in x])),3)
def cosine_similarity(x,y):
input1 = {}
input2 = {}
vector2 = []
vector1 =[]
if len(x) > len(y):
input1 = x
input2 = y
else:
input1 = y
input2 = x
vector1 = list(input1.values())
for k in input1.keys(): # Normalizing input vectors.
if k in input2:
vector2.append(float(input2[k])) #picking the values for the common keys from input 2
else :
vector2.append(float(0))
numerator = sum(a*b for a,b in zip(vector2,vector1))
denominator = square_rooted(vector1)*square_rooted(vector2)
return round(numerator/float(denominator),3)
print("Similarity between Shane and Joe")
print (cosine_similarity(ratings['Shane'],ratings['Joe']))
print("Similarity between Joe and Bob")
print (cosine_similarity(ratings['Joe'],ratings['Bob']))
print("Similarity between Shane and Bob")
print (cosine_similarity(ratings['Shane'],ratings['Bob']))
output:
Similarity between Shane and Joe
0.887
Similarity between Joe and Bob
0.346
Similarity between Shane and Bob
0.615
Nice explanation between jaccurd and cosine : https://datascience.stackexchange.com/questions/5121/applications-and-differences-for-jaccard-similarity-and-cosine-similarity
i am using Python 3.4
NOTE: I have assigned 0 to missing values. But you can assign some proper values too. Refer : http://www.analyticsvidhya.com/blog/2015/02/7-steps-data-exploration-preparation-building-model-part-2/
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With