I'm trying to implement an efficient vectorized numpy
to make a Manhattan distance matrix. I'm familiar with the construct used to create an efficient Euclidean distance matrix using dot products as follows:
A = [[1, 2]
[2, 1]]
B = [[1, 1],
[2, 2],
[1, 3],
[1, 4]]
def euclidean_distmtx(X, X):
f = -2 * np.dot(X, Y.T)
xsq = np.power(X, 2).sum(axis=1).reshape((-1, 1))
ysq = np.power(Y, 2).sum(axis=1)
return np.sqrt(xsq + f + ysq)
I want to implement somthing similar but using Manhattan distance instead. So far I've got close but fell short trying to rearrange the absolute differences. As I understand it, the Manhattan distance is
I tried to solve this by considering if the absolute function didn't apply at all giving me this equivalence
which gives me the following vectorization
def manhattan_distmtx(X, Y):
f = np.dot(X.sum(axis=1).reshape(-1, 1), Y.sum(axis=1).reshape(-1, 1).T)
return f / Y.sum(axis=1) - Y.sum(axis=1)
I think I'm the right track but I just can't move the values around without removing that absolute function around the difference between each vector elements. I'm sure there's a clever trick around the absolute values, possibly by using np.sqrt
of a squared value or something but I can't seem to realize it.
The Manhattan Distance between two points (X1, Y1) and (X2, Y2) is given by |X1 – X2| + |Y1 – Y2|.
How to Calculate Euclidean Distance in Python? The formula to calculate the distance between two points (x1 1 , y1 1 ) and (x2 2 , y2 2 ) is d = √[(x2 – x1)2 + (y2 – y1)2].
Python has its built-in method, in the math module, that calculates the distance between 2 points in 3d space. However, this only works with Python 3.8 or later. math. dist() takes in two parameters, which are the two points, and returns the Euclidean distance between those points.
I don't think we can leverage BLAS based matrix-multiplication here, as there's no element-wise multiplication involved here. But, we have few alternatives.
Approach #1
We can use Scipy's cdist
that features the Manhattan distance with its optional metric argument set as 'cityblock'
-
from scipy.spatial.distance import cdist
out = cdist(A, B, metric='cityblock')
Approach #2 - A
We can also leverage broadcasting
, but with more memory requirements -
np.abs(A[:,None] - B).sum(-1)
Approach #2 - B
That could be re-written to use less memory with slicing and summations for input arrays with two cols -
np.abs(A[:,0,None] - B[:,0]) + np.abs(A[:,1,None] - B[:,1])
Approach #2 - C
Porting over the broadcasting
version to make use of faster absolute
computation with numexpr
module -
import numexpr as ne
A3D = A[:,None]
out = ne.evaluate('sum(abs(A3D-B),2)')
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With