I'm trying to implement an efficient vectorized <code>numpy</code> to make a Manhattan distance matrix. I'm familiar with the construct used to create an efficient Euclidean distance matrix using dot products as follows: <pre class="prettyprint"><code>A = [[1, 2] [2, 1]] B = [[1, 1], [2, 2], [1, 3], [1, 4]] def euclidean_distmtx(X, X): f = -2 * np.dot(X, Y.T) xsq = np.power(X, 2).sum(axis=1).reshape((-1, 1)) ysq = np.power(Y, 2).sum(axis=1) return np.sqrt(xsq + f + ysq) </code></pre> I want to implement somthing similar but using Manhattan distance instead. So far I've got close but fell short trying to rearrange the absolute differences. As I understand it, the Manhattan distance is <img src="https://latex.codecogs.com/gif.latex?%5Csum_i&space;%7Cx_i&space;-&space;y_i%7C&space;=&space;%7Cx_1&space;-&space;y_1%7C&space;&plus;&space;%7Cx_2&space;-&space;y_2%7C&space;&plus;&space;..." alt="\sum_i |x_i - y_i| = |x_1 - y_1| + |x_2 - y_2| + ..."> I tried to solve this by considering if the absolute function didn't apply at all giving me this equivalence <img src="https://latex.codecogs.com/gif.latex?%5Csum_i&space;x_i&space;-&space;y_i&space;=&space;%5Csum_i&space;x_i&space;-&space;%5Csum_i&space;y_i" alt="\sum_i x_i - y_i = \sum_i x_i - \sum_i y_i"> which gives me the following vectorization <pre class="prettyprint"><code>def manhattan_distmtx(X, Y): f = np.dot(X.sum(axis=1).reshape(-1, 1), Y.sum(axis=1).reshape(-1, 1).T) return f / Y.sum(axis=1) - Y.sum(axis=1) </code></pre> I think I'm the right track but I just can't move the values around without removing that absolute function around the difference between each vector elements. I'm sure there's a clever trick around the absolute values, possibly by using <code>np.sqrt</code> of a squared value or something but I can't seem to realize it.

I don't think we can leverage BLAS based matrix-multiplication here, as there's no element-wise multiplication involved here. But, we have few alternatives. Approach #1 We can use Scipy's <code>cdist</code> that features the Manhattan distance with its optional metric argument set as <code>'cityblock'</code> - <pre class="prettyprint"><code>from scipy.spatial.distance import cdist out = cdist(A, B, metric='cityblock') </code></pre> Approach #2 - A We can also leverage <code>broadcasting</code>, but with more memory requirements - <pre class="prettyprint"><code>np.abs(A[:,None] - B).sum(-1) </code></pre> Approach #2 - B That could be re-written to use less memory with slicing and summations for input arrays with two cols - <pre class="prettyprint"><code>np.abs(A[:,0,None] - B[:,0]) + np.abs(A[:,1,None] - B[:,1]) </code></pre> Approach #2 - C Porting over the <code>broadcasting</code> version to make use of faster <code>absolute</code> computation with <code>numexpr</code> module - <pre class="prettyprint"><code>import numexpr as ne A3D = A[:,None] out = ne.evaluate('sum(abs(A3D-B),2)') </code></pre>

Vectorized matrix manhattan distance in numpy

Tags:

python

vectorization

numpy

I'm trying to implement an efficient vectorized numpy to make a Manhattan distance matrix. I'm familiar with the construct used to create an efficient Euclidean distance matrix using dot products as follows:

A = [[1, 2]   
     [2, 1]]

B = [[1, 1],
     [2, 2],
     [1, 3],
     [1, 4]]

def euclidean_distmtx(X, X):
    f = -2 * np.dot(X, Y.T)
    xsq = np.power(X, 2).sum(axis=1).reshape((-1, 1))
    ysq = np.power(Y, 2).sum(axis=1)
    return np.sqrt(xsq + f + ysq)

I want to implement somthing similar but using Manhattan distance instead. So far I've got close but fell short trying to rearrange the absolute differences. As I understand it, the Manhattan distance is

$\sum_i |x_i - y_i| = |x_1 - y_1| + |x_2 - y_2| + ...$

I tried to solve this by considering if the absolute function didn't apply at all giving me this equivalence

$\sum_i x_i - y_i = \sum_i x_i - \sum_i y_i$

which gives me the following vectorization

def manhattan_distmtx(X, Y):
    f = np.dot(X.sum(axis=1).reshape(-1, 1), Y.sum(axis=1).reshape(-1, 1).T)
    return f / Y.sum(axis=1) - Y.sum(axis=1)

I think I'm the right track but I just can't move the values around without removing that absolute function around the difference between each vector elements. I'm sure there's a clever trick around the absolute values, possibly by using np.sqrt of a squared value or something but I can't seem to realize it.

675

asked Dec 10 '17 06:12

Syafiq Kamarul Azman

1 Answers

I don't think we can leverage BLAS based matrix-multiplication here, as there's no element-wise multiplication involved here. But, we have few alternatives.

Approach #1

We can use Scipy's cdist that features the Manhattan distance with its optional metric argument set as 'cityblock' -

from scipy.spatial.distance import cdist

out = cdist(A, B, metric='cityblock')

Approach #2 - A

We can also leverage broadcasting, but with more memory requirements -

np.abs(A[:,None] - B).sum(-1)

Approach #2 - B

That could be re-written to use less memory with slicing and summations for input arrays with two cols -

np.abs(A[:,0,None] - B[:,0]) + np.abs(A[:,1,None] - B[:,1])

Approach #2 - C

Porting over the broadcasting version to make use of faster absolute computation with numexpr module -

import numexpr as ne
A3D = A[:,None]
out = ne.evaluate('sum(abs(A3D-B),2)')

answered Sep 17 '22 11:09

Divakar

Related questions
                            
                                When using cx_Freeze and tkinter I get: "DLL load failed: The specified module could not be found." (Python 3.5.3)
                            
                                Passing Python objects between Tasks in Luigi?
                            
                                How to convert json to object?
                            
                                List all attributes which are inherited by a class
                            
                                call multiprocessing in class method Python
                            
                                What is the equivalent of Python's ast.literal_eval() in Julia?
                            
                                Python logger is not printing debug messages, although it is set correctly
                            
                                Is it possible to use the python3 bindings for VirtualBox?
                            
                                How can I write multi-line code in the Terminal use python?
                            
                                EOFError: Ran out of input inside a class
                            
                                Structured 2D Numpy Array: setting column and row names
                            
                                Are the attributes in a Python class shared or not? [duplicate]
                            
                                Melting pandas data frame with multiple variable names and multiple value names
                            
                                Safe to call multiprocessing from a thread in Python?
                            
                                exchangelib - All steps in the autodiscover protocol failed
                            
                                Fuzzy regex (e.g. {e<=2}) correct usage in Python
                            
                                Restoring a Tensorflow model that uses Iterators
                            
                                Running for loop terminal commands in Jupyter
                            
                                Where to apply batch normalization on standard CNNs
                            
                                Plotnine rotating labels

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With