Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Calculate Euclidean distance on numpy row-row cross product?

I have two numpy arrays with n number of coordinates (two items per row).

coords_a = np.random.random((20, 2))
coords_b = np.random.random((20, 2))

Now, for each combination of rows, I want to compute a function and save the return value as item in a matrix. The resulting array should therefore have shape (20, 20) and can be "lazily" computed as shown below. As exemplary function, the Euclidean distance is used.

def euclidean_dist(x1: float, y1: float, x2: float, y2: float) -> float:
    """Return the euclidean distance between two the points (x1, y1) and (x2, y2)."""
    return np.sqrt(np.square(x1 - x2) + np.square(y1 - y2))

matrix = []
for a in coords_a:
    row = []
    for b in coords_b:
        row.append(euclidean_dist(*a, *b))
    matrix.append(row)
    
matrix = np.array(matrix)

As you can imagine, this nested for loop is very time consuming taking over 25 seconds with just 2000 coordinate pairs. Is there a recommended way of vectoring this sort of cross product?

Thanks in advance.

like image 486
BBQuercus Avatar asked Oct 16 '25 17:10

BBQuercus


2 Answers

I would like to add my 2 cents since not every function is already implemented in numpy or scipy. In general you can use numpy broadcasting to achieve vectorized solution. For the specific case of euclidean distance here how you do it:

import numpy as np

# Define the arrays of coordinates
coords_a = np.random.random((20, 2))
coords_b = np.random.random((20, 2))

# Expand their dimensions
a = coords_a[:, None]
b = coords_b[None, None]

# Use broadcasting to compute pairwise difference
d = a-b

# Apply formula for euclidean distance
r = np.sqrt(np.sum(d**2, axis=-1)) 

In terms of time performance for this specific case scipy.spatial.distance.cdist is way faster, yet not every function is available:

import numpy as np
from scipy.spatial.distance import cdist

a = np.random.random((10_000, 2))
b = np.random.random((10_000, 2))

euc_broadcast = lambda a,b: np.sqrt(np.sum(np.square(a[:, None]-b[None, :]), axis=-1))

%timeit euc_broadcast(a, b)
3.39 s ± 149 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

%timeit cdist(a, b)
603 ms ± 13.2 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
like image 184
FBruzzesi Avatar answered Oct 18 '25 06:10

FBruzzesi


For your specific example you can do:

from scipy.spatial.distance import cdist
cdist(coords_b,coords_a)

In general, vectorizing depends on your function.

like image 29
Ehsan Avatar answered Oct 18 '25 05:10

Ehsan



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!