I have the following code. It is taking forever in Python. There must be a way to translate this calculation into a broadcast... <pre class="prettyprint"><code>def euclidean_square(a,b): squares = np.zeros((a.shape[0],b.shape[0])) for i in range(squares.shape[0]): for j in range(squares.shape[1]): diff = a[i,:] - b[j,:] sqr = diff**2.0 squares[i,j] = np.sum(sqr) return squares </code></pre>

You can use <code>np.einsum</code> after calculating the differences in a <code>broadcasted way</code>, like so - <pre class="prettyprint"><code>ab = a[:,None,:] - b out = np.einsum('ijk,ijk->ij',ab,ab) </code></pre> Or use <code>scipy's cdist</code> with its optional metric argument set as <code>'sqeuclidean'</code> to give us the squared euclidean distances as needed for our problem, like so - <pre class="prettyprint"><code>from scipy.spatial.distance import cdist out = cdist(a,b,'sqeuclidean') </code></pre>

I collected the different methods proposed here, and in two other questions, and measured the speed of the different methods: <pre class="prettyprint"><code>import numpy as np import scipy.spatial import sklearn.metrics def dist_direct(x, y): d = np.expand_dims(x, -2) - y return np.sum(np.square(d), axis=-1) def dist_einsum(x, y): d = np.expand_dims(x, -2) - y return np.einsum('ijk,ijk->ij', d, d) def dist_scipy(x, y): return scipy.spatial.distance.cdist(x, y, "sqeuclidean") def dist_sklearn(x, y): return sklearn.metrics.pairwise.pairwise_distances(x, y, "sqeuclidean") def dist_layers(x, y): res = np.zeros((x.shape[0], y.shape[0])) for i in range(x.shape[1]): res += np.subtract.outer(x[:, i], y[:, i])**2 return res # inspired by the excellent https://github.com/droyed/eucl_dist def dist_ext1(x, y): nx, p = x.shape x_ext = np.empty((nx, 3*p)) x_ext[:, :p] = 1 x_ext[:, p:2*p] = x x_ext[:, 2*p:] = np.square(x) ny = y.shape[0] y_ext = np.empty((3*p, ny)) y_ext[:p] = np.square(y).T y_ext[p:2*p] = -2*y.T y_ext[2*p:] = 1 return x_ext.dot(y_ext) # https://stackoverflow.com/a/47877630/648741 def dist_ext2(x, y): return np.einsum('ij,ij->i', x, x)[:,None] + np.einsum('ij,ij->i', y, y) - 2 * x.dot(y.T) </code></pre> I use <code>timeit</code> to compare the speed of the different methods. For the comparison, I use vectors of length 10, with 100 vectors in the first group, and 1000 vectors in the second group. <pre class="prettyprint"><code>import timeit p = 10 x = np.random.standard_normal((100, p)) y = np.random.standard_normal((1000, p)) for method in dir(): if not method.startswith("dist_"): continue t = timeit.timeit(f"{method}(x, y)", number=1000, globals=globals()) print(f"{method:12} {t:5.2f}ms") </code></pre> On my laptop, the results are as follows: <pre class="prettyprint"><code>dist_direct 5.07ms dist_einsum 3.43ms dist_ext1 0.20ms <-- fastest dist_ext2 0.35ms dist_layers 2.82ms dist_scipy 0.60ms dist_sklearn 0.67ms </code></pre> While the two methods <code>dist_ext1</code> and <code>dist_ext2</code>, both based on the idea of writing <code>(x-y)**2</code> as <code>x**2 - 2*x*y + y**2</code>, are very fast, there is a downside: When the distance between <code>x</code> and <code>y</code> is very small, due to cancellation error the numerical result can sometimes be (very slightly) negative.

NumPy Broadcasting: Calculating sum of squared differences between two arrays

I have the following code. It is taking forever in Python. There must be a way to translate this calculation into a broadcast...

def euclidean_square(a,b):
    squares = np.zeros((a.shape[0],b.shape[0]))
    for i in range(squares.shape[0]):
        for j in range(squares.shape[1]):
            diff = a[i,:] - b[j,:]
            sqr = diff**2.0
            squares[i,j] = np.sum(sqr)
    return squares

How do you find the difference between two arrays in NumPy?

subtract() function is used when we want to compute the difference of two array.It returns the difference of arr1 and arr2, element-wise. Parameters : arr1 : [array_like or scalar]1st Input array.

Can Python broadcast multiple dimensions?

The arrays can be broadcast together iff they are compatible with all dimensions. After broadcasting, each array behaves as if it had shape equal to the element-wise maximum of shapes of the two input arrays.

What is broadcasting in NumPy explain with example what are different rules in broadcasting?

The term broadcasting refers to the ability of NumPy to treat arrays of different shapes during arithmetic operations. Arithmetic operations on arrays are usually done on corresponding elements. If two arrays are of exactly the same shape, then these operations are smoothly performed.

What is broadcasting in NumPy?

The term broadcasting describes how NumPy treats arrays with different shapes during arithmetic operations. Subject to certain constraints, the smaller array is “broadcast” across the larger array so that they have compatible shapes.

You can use np.einsum after calculating the differences in a broadcasted way, like so -

ab = a[:,None,:] - b
out = np.einsum('ijk,ijk->ij',ab,ab)

Or use scipy's cdist with its optional metric argument set as 'sqeuclidean' to give us the squared euclidean distances as needed for our problem, like so -

from scipy.spatial.distance import cdist
out = cdist(a,b,'sqeuclidean')

I collected the different methods proposed here, and in two other questions, and measured the speed of the different methods:

import numpy as np
import scipy.spatial
import sklearn.metrics

def dist_direct(x, y):
    d = np.expand_dims(x, -2) - y
    return np.sum(np.square(d), axis=-1)

def dist_einsum(x, y):
    d = np.expand_dims(x, -2) - y
    return np.einsum('ijk,ijk->ij', d, d)

def dist_scipy(x, y):
    return scipy.spatial.distance.cdist(x, y, "sqeuclidean")

def dist_sklearn(x, y):
    return sklearn.metrics.pairwise.pairwise_distances(x, y, "sqeuclidean")

def dist_layers(x, y):
    res = np.zeros((x.shape[0], y.shape[0]))
    for i in range(x.shape[1]):
        res += np.subtract.outer(x[:, i], y[:, i])**2
    return res

# inspired by the excellent https://github.com/droyed/eucl_dist
def dist_ext1(x, y):
    nx, p = x.shape
    x_ext = np.empty((nx, 3*p))
    x_ext[:, :p] = 1
    x_ext[:, p:2*p] = x
    x_ext[:, 2*p:] = np.square(x)

    ny = y.shape[0]
    y_ext = np.empty((3*p, ny))
    y_ext[:p] = np.square(y).T
    y_ext[p:2*p] = -2*y.T
    y_ext[2*p:] = 1

    return x_ext.dot(y_ext)

# https://stackoverflow.com/a/47877630/648741
def dist_ext2(x, y):
    return np.einsum('ij,ij->i', x, x)[:,None] + np.einsum('ij,ij->i', y, y) - 2 * x.dot(y.T)

I use timeit to compare the speed of the different methods. For the comparison, I use vectors of length 10, with 100 vectors in the first group, and 1000 vectors in the second group.

import timeit

p = 10
x = np.random.standard_normal((100, p))
y = np.random.standard_normal((1000, p))

for method in dir():
    if not method.startswith("dist_"):
        continue
    t = timeit.timeit(f"{method}(x, y)", number=1000, globals=globals())
    print(f"{method:12} {t:5.2f}ms")

On my laptop, the results are as follows:

dist_direct   5.07ms
dist_einsum   3.43ms
dist_ext1     0.20ms  <-- fastest
dist_ext2     0.35ms
dist_layers   2.82ms
dist_scipy    0.60ms
dist_sklearn  0.67ms

While the two methods dist_ext1 and dist_ext2, both based on the idea of writing (x-y)**2 as x**2 - 2*x*y + y**2, are very fast, there is a downside: When the distance between x and y is very small, due to cancellation error the numerical result can sometimes be (very slightly) negative.

NumPy Broadcasting: Calculating sum of squared differences between two arrays

Tags:

python

numpy

array-broadcasting

Chris

People also ask

2 Answers

Divakar

jochen

Recent Activity

Donate For Us

NumPy Broadcasting: Calculating sum of squared differences between two arrays

Tags:

python

numpy

array-broadcasting

Chris

People also ask

2 Answers

Divakar

jochen

Related questions

Recent Activity

Donate For Us