Sum the squared difference between 2 Numpy arrays




Suppose I have the following 2 arrays:

import numpy as np

For every row a_row in a, I would like to get the sum of squared difference between a_row and every row in b. The resulted array would be a 2 by 4 array. The expected result would be the following:

array([[ 11.,   5.,  14.,  10.],
       [  2.,   2.,   1.,   3.]])

I've already implemented a solution using loop:

for e in range(a.shape[0]):
    c[e,:] = np.sum(np.square(b-a[e,:]),axis=1)
print c

What I need is a fully vectorized solution, i.e. no loop is required.

2 Answers

Here is a Numpythonic approach, simply by reshaping the b in order to be able to directly subtract the a from it:

>>> np.square(b[:,None] - a).sum(axis=2).T
array([[11,  5, 14, 10],
       [ 2,  2,  1,  3]])
If you have access to scipy, then you could do:

import scipy
from scipy.spatial.distance import cdist

import numpy as np


x = cdist(a,b)**2
# print x
# array([[ 11.,   5.,  14.,  10.],
#        [  2.,   2.,   1.,   3.]])

This uses the cdist function which is vectorized and fast. You can possibly get a bit more speed using numba or cython, but it depends on the size of your arrays in practice.

