Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Vectorized matrix manhattan distance in numpy

I'm trying to implement an efficient vectorized numpy to make a Manhattan distance matrix. I'm familiar with the construct used to create an efficient Euclidean distance matrix using dot products as follows:

A = [[1, 2]   
     [2, 1]]

B = [[1, 1],
     [2, 2],
     [1, 3],
     [1, 4]]

def euclidean_distmtx(X, X):
    f = -2 * np.dot(X, Y.T)
    xsq = np.power(X, 2).sum(axis=1).reshape((-1, 1))
    ysq = np.power(Y, 2).sum(axis=1)
    return np.sqrt(xsq + f + ysq)

I want to implement somthing similar but using Manhattan distance instead. So far I've got close but fell short trying to rearrange the absolute differences. As I understand it, the Manhattan distance is

\sum_i |x_i - y_i| = |x_1 - y_1| + |x_2 - y_2| + ...

I tried to solve this by considering if the absolute function didn't apply at all giving me this equivalence

\sum_i x_i - y_i = \sum_i x_i - \sum_i y_i

which gives me the following vectorization

def manhattan_distmtx(X, Y):
    f = np.dot(X.sum(axis=1).reshape(-1, 1), Y.sum(axis=1).reshape(-1, 1).T)
    return f / Y.sum(axis=1) - Y.sum(axis=1)

I think I'm the right track but I just can't move the values around without removing that absolute function around the difference between each vector elements. I'm sure there's a clever trick around the absolute values, possibly by using np.sqrt of a squared value or something but I can't seem to realize it.

like image 675
Syafiq Kamarul Azman Avatar asked Dec 10 '17 06:12

Syafiq Kamarul Azman


People also ask

How do you code a Manhattan distance?

The Manhattan Distance between two points (X1, Y1) and (X2, Y2) is given by |X1 – X2| + |Y1 – Y2|.

How do you find the distance between two vectors in Python?

How to Calculate Euclidean Distance in Python? The formula to calculate the distance between two points (x1 1 , y1 1 ) and (x2 2 , y2 2 ) is d = √[(x2 – x1)2 + (y2 – y1)2].

How can I find the distance between two points in NumPy?

Python has its built-in method, in the math module, that calculates the distance between 2 points in 3d space. However, this only works with Python 3.8 or later. math. dist() takes in two parameters, which are the two points, and returns the Euclidean distance between those points.


1 Answers

I don't think we can leverage BLAS based matrix-multiplication here, as there's no element-wise multiplication involved here. But, we have few alternatives.

Approach #1

We can use Scipy's cdist that features the Manhattan distance with its optional metric argument set as 'cityblock' -

from scipy.spatial.distance import cdist

out = cdist(A, B, metric='cityblock')

Approach #2 - A

We can also leverage broadcasting, but with more memory requirements -

np.abs(A[:,None] - B).sum(-1)

Approach #2 - B

That could be re-written to use less memory with slicing and summations for input arrays with two cols -

np.abs(A[:,0,None] - B[:,0]) + np.abs(A[:,1,None] - B[:,1])

Approach #2 - C

Porting over the broadcasting version to make use of faster absolute computation with numexpr module -

import numexpr as ne
A3D = A[:,None]
out = ne.evaluate('sum(abs(A3D-B),2)')
like image 60
Divakar Avatar answered Sep 17 '22 11:09

Divakar