Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Return 'similar score' based on two dictionaries' similarity in Python?

I know it's possible to return how similar two strings are by using the following function:

from difflib import SequenceMatcher
def similar(a, b):
    output=SequenceMatcher(None, a, b).ratio()
    return output

In [37]: similar("Hey, this is a test!","Hey, man, this is a test, man.")
Out[37]: 0.76
In [38]: similar("This should be one.","This should be one.")
Out[38]: 1.0

But is it possible to score two dictionaries based on the similarity of keys and their corresponding values? Not a number of in common keys, or what is in common, but a score from 0 to 1, like the example above with strings.

I'm trying to find the similarity score between ratings['Shane'] and ratings['Joe'] in this dictionary:

ratings={'Shane': {'127 Hours': 3.0, 'Avatar': 4.0, 'Nonstop': 5.0}, 'Joe': {'127 Hours': 5.0, 'Taken 3': 4.0, 'Avatar': 5.0, 'Nonstop': 3.0}}

I am using Python 2.7.10

like image 516
Shane Smiskol Avatar asked Mar 14 '16 06:03

Shane Smiskol


People also ask

Can we compare 2 dictionaries in Python?

Using == operator to Compare Two Dictionaries Here we are using the equality comparison operator in Python to compare two dictionaries whether both have the same key value pairs or not.

How do you compare two dictionaries with the same key in Python?

You can use set intersection on the dictionaries keys() . Then loop over those and check if the values corresponding to those keys are identical.

How do you check if two values are the same in a dictionary Python?

Use == to check equality of two dictionaries Use == to check if two dictionaries contain the same set of key: value pairs.

How do you know if two dictionaries have the same value?

The simplest technique to check if two or multiple dictionaries are equal is by using the == operator in Python. You can create the dictionaries with any of the methods defined in Python and then compare them using the == operator. It will return True the dictionaries are equals and False if not.


1 Answers

import math

ratings={'Shane': {'127 Hours': 3.0, 'Avatar': 4.0, 'Nonstop': 5.0}, 'Joe': {'127 Hours': 5.0, 'Taken 3': 4.0, 'Avatar': 5.0, 'Nonstop': 3.0}}

def cosine_similarity(vec1,vec2):
        sum11, sum12, sum22 = 0, 0, 0
        for i in range(len(vec1)):
            x = vec1[i]; y = vec2[i]
            sum11 += x*x
            sum22 += y*y
            sum12 += x*y
        return sum12/math.sqrt(sum11*sum22)

list1 = list(ratings['Shane'].values())
list2 =  list(ratings['Joe'].values())

sim = cosine_similarity(list1,list2)
print(sim)

output

o/p : 0.9205746178983233

Updated When i use :

ratings={'Shane': {'127 Hours': 5.0, 'Avatar': 4.0, 'Nonstop': 5.0},
         'Joe': {'127 Hours': 5.0, 'Taken 3': 4.0, 'Avatar': 5.0, 'Nonstop': 3.0}}

output :0.9574271077563381

Update2: Normalized length and considered keys

from math import*


ratings={'Shane': {'127 Hours': 5.0, 'Avatar': 4.0, 'Nonstop': 5.0},
         'Joe': {'127 Hours': 5.0, 'Taken 3': 4.0, 'Avatar': 5.0, 'Nonstop': 3.0},
         'Bob': {'Panic Room':5.0,'Nonstop':5.0}}


def square_rooted(x):

    return round(sqrt(sum([a*a for a in x])),3)

def cosine_similarity(x,y):

    input1 = {}
    input2 = {}
    vector2 = []
    vector1 =[]

    if len(x) > len(y):
        input1 = x
        input2 = y
    else:
        input1 = y
        input2 = x


    vector1 = list(input1.values())

    for k in input1.keys():    # Normalizing input vectors. 
        if k in input2:
            vector2.append(float(input2[k])) #picking the values for the common keys from input 2
        else :
            vector2.append(float(0))


    numerator = sum(a*b for a,b in zip(vector2,vector1))
    denominator = square_rooted(vector1)*square_rooted(vector2)
    return round(numerator/float(denominator),3)


print("Similarity between Shane and Joe")
print (cosine_similarity(ratings['Shane'],ratings['Joe']))

print("Similarity between Joe and Bob")
print (cosine_similarity(ratings['Joe'],ratings['Bob']))

print("Similarity between Shane and Bob")
print (cosine_similarity(ratings['Shane'],ratings['Bob']))

output:

Similarity between Shane and Joe
0.887
Similarity between Joe and Bob
0.346
Similarity between Shane and Bob
0.615

Nice explanation between jaccurd and cosine : https://datascience.stackexchange.com/questions/5121/applications-and-differences-for-jaccard-similarity-and-cosine-similarity

i am using Python 3.4

NOTE: I have assigned 0 to missing values. But you can assign some proper values too. Refer : http://www.analyticsvidhya.com/blog/2015/02/7-steps-data-exploration-preparation-building-model-part-2/

like image 93
backtrack Avatar answered Sep 25 '22 01:09

backtrack