Pairwise Cosine Similarity using TensorFlow

Question

How can we efficiently calculate pairwise cosine distances in a matrix using TensorFlow? Given an MxN matrix, the result should be an MxM matrix, where the element at position [i][j] is the cosine distance between i-th and j-th rows/vectors in the input matrix.

This can be done with Scikit-Learn fairly easily as follows:

from sklearn.metrics.pairwise import pairwise_distances

pairwise_distances(input_matrix, metric='cosine')

Is there an equivalent method in TensorFlow?

Andre Holzner · Accepted Answer

There is an answer for getting a single cosine distance here: https://stackoverflow.com/a/46057597/288875 . This is based on tf.losses.cosine_distance .

Here is a solution which does this for matrices:

import tensorflow as tf
import numpy as np

with tf.Session() as sess:

    M = 3

    # input
    input = tf.placeholder(tf.float32, shape = (M, M))

    # normalize each row
    normalized = tf.nn.l2_normalize(input, dim = 1)

    # multiply row i with row j using transpose
    # element wise product
    prod = tf.matmul(normalized, normalized,
                     adjoint_b = True # transpose second matrix
                     )

    dist = 1 - prod

    input_matrix = np.array(
        [[ 1, 1, 1 ],
         [ 0, 1, 1 ],
         [ 0, 0, 1 ],
         ],
        dtype = 'float32')

    print "input_matrix:"
    print input_matrix

    from sklearn.metrics.pairwise import pairwise_distances
    print "sklearn:"
    print pairwise_distances(input_matrix, metric='cosine')

    print "tensorflow:"
    print sess.run(dist, feed_dict = { input : input_matrix })

which gives me:

input_matrix:
[[ 1.  1.  1.]
 [ 0.  1.  1.]
 [ 0.  0.  1.]]
sklearn:
[[ 0.          0.18350345  0.42264974]
 [ 0.18350345  0.          0.29289323]
 [ 0.42264974  0.29289323  0.        ]]
tensorflow:
[[  5.96046448e-08   1.83503449e-01   4.22649741e-01]
 [  1.83503449e-01   5.96046448e-08   2.92893231e-01]
 [  4.22649741e-01   2.92893231e-01   0.00000000e+00]]

Note that this solution may not be the optimal one as it calculates all entries of the (symmetric) result matrix, i.e. does almost twice of the calculations. This is likely not a problem for small matrices, for large matrices a combination of loops may be faster.

Note also that this does not have a minibatch dimension so works for a single matrix only.

Denis Kuzin · Answer

Elegant solution (output is the same as from scikit-learn pairwise_distances function):

def compute_cosine_distances(a, b):
    # x shape is n_a * dim
    # y shape is n_b * dim
    # results shape is n_a * n_b

    normalize_a = tf.nn.l2_normalize(a,1)        
    normalize_b = tf.nn.l2_normalize(b,1)
    distance = 1 - tf.matmul(normalize_a, normalize_b, transpose_b=True)
    return distance

test

input_matrix = np.array([[1, 1, 1],
                         [0, 1, 1],
                         [0, 0, 1]], dtype = 'float32')

compute_cosine_distances(input_matrix, input_matrix)

output:

<tf.Tensor: id=442, shape=(3, 3), dtype=float32, numpy=
array([[5.9604645e-08, 1.8350345e-01, 4.2264974e-01],
       [1.8350345e-01, 5.9604645e-08, 2.9289323e-01],
       [4.2264974e-01, 2.9289323e-01, 0.0000000e+00]], dtype=float32)>

Pairwise Cosine Similarity using TensorFlow

Tags:

matrix

tensorflow

Hiranya Jayathilaka

2 Answers

Andre Holzner

Denis Kuzin

Recent Activity

Donate For Us

Pairwise Cosine Similarity using TensorFlow

Tags:

matrix

tensorflow

Hiranya Jayathilaka

2 Answers

Andre Holzner

Denis Kuzin

Related questions

Recent Activity

Donate For Us