cosine similarity optimized implementation

Tags:

I am trying to understand this optimized code to find cosine similarity between users matrix.

def fast_similarity(ratings,epsilon=1e-9):
    # epsilon -> small number for handling dived-by-zero errors
    sim = ratings.T.dot(ratings) + epsilon
    norms = np.array([np.sqrt(np.diagonal(sim))])
    return (sim / norms / norms.T)

If ratings =

           items           
     u  [
     s    [1,2,3]
     e    [4,5,6]
     r    [7,8,9] 
     s  ]

nomrs will be equal to = [1^2 + 5^2 + 9^2]

but why we are writing sim/norms/norms.T to calculate cosine similarity? Any help is appreciated.

577

asked Mar 29 '17 07:03

Manish Kumar

1 Answers

Going through the code we have that:

$first$

And this means that, one the diagonal of the sim matrix we have the result of the multiplication of each column.

You can give it a try if you want using a simple matrix:

$second$

And you can easily check that this gram matrix (that's how this matrix product is named) has this property.

Now the code defines norms that is nothing but an array taking the diagonal of our gram matrix and apply a sqrt on each element of it.

This will give us an array containing the norm value for each column:

$third$

So basically the norms vector contains the norm value of each column of the result matrix.

Once we have all those data we can evaluate the cosine similarity between those users, so we know that cosine similarity is evaluated like:

$forth$

Note that : $fifth$

So we have that our similarity is going to be:

$six$

So we just have to substitute the terms with our code variable to get:

$seven$

And this explain why you have this line of code:

return sim / norms / norms.T

EDIT: Since it seems that I was not clear, every time I am talking about matrix multiplication in this answer I am reffering to the DOT PRODUCT of two matrices.

This actually means that when it's written A*B we actually develop and solve as A.T * B

179

answered Sep 23 '22 09:09

rakwaht

Related questions
                            
                                How to create a queryset that filters multiple fields based on a single condition in Django?
                            
                                how can i check the size of npm packages before download?
                            
                                How do you programmatically change the alpha of a UIVisualEffectView in a NavigationBar?
                            
                                No content to map due to end-of-input when parsing json
                            
                                How to get Facebook tracking url parameters through API
                            
                                Ratchet Store Connection of User & Send Message Outside of Server Instance
                            
                                Is it correct to attach code through volume in kubernetes?
                            
                                How to implement a generic linked list map function in Rust?
                            
                                Updating .stringsdict localization files over-the-air
                            
                                Background colour in Mailchimp
                            
                                geom_area different fill colour when above critical level
                            
                                When are nested functions generally appropriate in JavaScript?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With