Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Calculate Mahalanobis distance using NumPy only

Tags:

python

numpy

I am looking for NumPy way of calculating Mahalanobis distance between two numpy arrays (x and y). The following code can correctly calculate the same using cdist function of Scipy. Since this function calculates unnecessary matix in my case, I want more straight way of calculating it using NumPy only.

import numpy as np
from scipy.spatial.distance import cdist

x = np.array([[[1,2,3,4,5],
               [5,6,7,8,5],
               [5,6,7,8,5]],
              [[11,22,23,24,5],
               [25,26,27,28,5],
               [5,6,7,8,5]]])
i,j,k = x.shape

xx = x.reshape(i,j*k).T


y = np.array([[[31,32,33,34,5],
               [35,36,37,38,5],
               [5,6,7,8,5]],
              [[41,42,43,44,5],
               [45,46,47,48,5],
               [5,6,7,8,5]]])


yy = y.reshape(i,j*k).T

results =  cdist(xx,yy,'mahalanobis')
results = np.diag(results)
print results



[ 2.28765854  2.75165028  2.75165028  2.75165028  0.          2.75165028
  2.75165028  2.75165028  2.75165028  0.          0.          0.          0.
  0.          0.        ]

My trial:

VI = np.linalg.inv(np.cov(xx,yy))

print np.sqrt(np.dot(np.dot((xx-yy),VI),(xx-yy).T))

Could anybody correct this method?

Here is formula for it:

http://docs.scipy.org/doc/scipy-0.14.0/reference/generated/scipy.spatial.distance.mahalanobis.html#scipy.spatial.distance.mahalanobis

like image 702
Borys Avatar asked Dec 29 '14 09:12

Borys


People also ask

How do you calculate Mahalanobis distance matrix?

First you calculate the covariance matrix, (S in the equation, “covar mat” in the image). Then you find the inverse of S (“inv-covar” in the image). Then you subtract the mean from v: (66, 640, 44) – (68.0, 600.0, 40.0) to get v-m = (-2, 40, 4).

How do you use Mahalanobis distance?

The lower the Mahalanobis Distance, the closer a point is to the set of benchmark points. A Mahalanobis Distance of 1 or lower shows that the point is right among the benchmark points. This is going to be a good one. The higher it gets from there, the further it is from where the benchmark points are.

How do you find the distance of a matrix in python?

A distance matrix contains the distances computed pairwise between the vectors of matrix/ matrices. scipy. spatial package provides us distance_matrix() method to compute the distance matrix. Generally matrices are in the form of 2-D array and the vectors of the matrix are matrix rows ( 1-D array).


2 Answers

I think your problem lies in the construction of your covariance matrix. Try:

X = np.vstack([xx,yy])
V = np.cov(X.T)
VI = np.linalg.inv(V)
print np.diag(np.sqrt(np.dot(np.dot((xx-yy),VI),(xx-yy).T)))

Output:

[ 2.28765854  2.75165028  2.75165028  2.75165028  0.          2.75165028
  2.75165028  2.75165028  2.75165028  0.          0.          0.          0.
  0.          0.        ]

To do this without the intermediate array implicitly created here, you might have to sacrifice a C loop for a Python one:

A = np.dot((xx-yy),VI)
B = (xx-yy).T
n = A.shape[0]
D = np.empty(n)
for i in range(n):
    D[i] = np.sqrt(np.sum(A[i] * B[:,i]))

EDIT: actually, with np.einsum voodoo you can remove the Python loop and speed it up a lot (on my system, from 84.3 µs to 2.9 µs):

D = np.sqrt(np.einsum('ij,ji->i', A, B))

EDIT: As @Warren Weckesser points out, einsum can be used to do away with the intermediate A and B arrays too:

delta = xx - yy
D = np.sqrt(np.einsum('nj,jk,nk->n', delta, VI, delta))
like image 192
xnx Avatar answered Sep 23 '22 23:09

xnx


Another simple solution which is just as fast as the einsum

e = xx-yy
X = np.vstack([xx,yy])
V = np.cov(X.T) 
p = np.linalg.inv(V)
D = np.sqrt(np.sum(np.dot(e,p) * e, axis = 1))
like image 21
David Avatar answered Sep 25 '22 23:09

David