Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Efficient computation of similarity matrix in Python (NumPy)

Let X be a Bxn numpy matrix, i.e.,

import numpy as np
B = 10
n = 2
X = np.random.random((B, n))

Now, I'm interested in computing the so-called kernel (or even similarity) matrix K, which is of shape BxB, and its {i,j}-th element is given as follows:

K(i,j) = fun(x_i, x_j)

where x_t denotes the t-th row of matrix X and fun is some function of x_i, x_j. For instance, this function could be the so-called RBF function, i.e.,

K(i,j) = exp(-|x_i - x_j|^2).

For doing so, a naive way would be the following:

K = np.zeros((B, B))
for i in range(X.shape[0]):
    x_i = X[i, :]
    for j in range(X.shape[0]):
        x_j = X[j, :]
        K[i, j] = np.exp(-np.linalg.norm(x_i - x_j, 2) ** 2)

What I want is to do the above operation in a vectorized way, for the sake of efficiency. Could you help?

like image 317
nullgeppetto Avatar asked Feb 21 '18 13:02

nullgeppetto


1 Answers

This is certainly possible in numpy alone if you harness the power of broadcasting.

You just have to code out the inner distance-norm calculation in a vectorized way:

X1 = X[:, np.newaxis, :]
X2 = X[np.newaxis, :, :]
K = np.exp(-np.sum((X1 - X2)**2, axis=-1))
like image 83
MB-F Avatar answered Sep 19 '22 15:09

MB-F