Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Cosine Similarity Code (non-term vectors)

I am trying to find the cosine similarity between 2 vectors (x,y Points) and I am making some silly error that I cannot nail down. Pardone me am a newbie and sorry if I am making a very simple error (which I very likely am).

Thanks for your help

  public static double GetCosineSimilarity(List<Point> V1, List<Point> V2)
    {
        double sim = 0.0d;
        int N = 0;
        N = ((V2.Count < V1.Count)?V2.Count : V1.Count);
        double dotX = 0.0d; double dotY = 0.0d;
        double magX = 0.0d; double magY = 0.0d;
        for (int n = 0; n < N; n++)
        {
            dotX += V1[n].X * V2[n].X;
            dotY += V1[n].Y * V2[n].Y;
            magX += Math.Pow(V1[n].X, 2);
            magY += Math.Pow(V1[n].Y, 2);
        }

        return (dotX + dotY)/(Math.Sqrt(magX) * Math.Sqrt(magY));
    }

Edit: Apart from syntax, my question was also to do with the logical construct given I am dealing with Vectors of differing lengths. Also, how is the above generalizable to vectors of m dimensions. Thanks

like image 778
Mikos Avatar asked Sep 26 '11 20:09

Mikos


People also ask

What is the cosine similarity between the below two vectors?

Cosine similarity measures the similarity between two vectors of an inner product space. It is measured by the cosine of the angle between two vectors and determines whether two vectors are pointing in roughly the same direction. It is often used to measure document similarity in text analysis.

What is the cosine similarity between a vector and itself?

The cosine similarity is the cosine of the angle between vectors. The vectors are typically non-zero and are within an inner product space. The cosine similarity is described mathematically as the division between the dot product of vectors and the product of the euclidean norms or magnitude of each vector.

Does Word2Vec use cosine similarity?

Word2Vec is a model used to represent words into vectors. Then, the similarity value can be generated using the Cosine Similarity formula of the word vector values produced by the Word2Vec model.


2 Answers

If you are in 2-dimensions, then you can have vectors represented as (V1.X, V1.Y) and (V2.X, V2.Y), then use

public static double GetCosineSimilarity(Point V1, Point V2) {
 return (V1.X*V2.X + V1.Y*V2.Y) 
         / ( Math.Sqrt( Math.Pow(V1.X,2)+Math.Pow(V1.Y,2))
             Math.Sqrt( Math.Pow(V2.X,2)+Math.Pow(V2.Y,2))
           );
}

If you are in higher dimensions then you can represent each vector as List<double>. So, in 4-dimensions the first vector would have components V1 = (V1[0], V1[1], V1[2], V1[3]).

public static double GetCosineSimilarity(List<double> V1, List<double> V2)
{
    int N = 0;
    N = ((V2.Count < V1.Count) ? V2.Count : V1.Count);
    double dot = 0.0d;
    double mag1 = 0.0d;
    double mag2 = 0.0d;
    for (int n = 0; n < N; n++)
    {
        dot += V1[n] * V2[n];
        mag1 += Math.Pow(V1[n], 2);
        mag2 += Math.Pow(V2[n], 2);
    }

    return dot / (Math.Sqrt(mag1) * Math.Sqrt(mag2));
}
like image 80
JohnPS Avatar answered Oct 13 '22 10:10

JohnPS


The last line should be

return (dotX + dotY)/(Math.Sqrt(magX) * Math.Sqrt(magY))
like image 36
HasaniH Avatar answered Oct 13 '22 08:10

HasaniH