I was interested in calculating various spatial distances between two numpy arrays (x and y).
http://docs.scipy.org/doc/scipy-0.14.0/reference/generated/scipy.spatial.distance.cdist.html
import numpy as np
from scipy.spatial.distance import cdist
x = np.array([[[1,2,3,4,5],
[5,6,7,8,5],
[5,6,7,8,5]],
[[11,22,23,24,5],
[25,26,27,28,5],
[5,6,7,8,5]]])
i,j,k = x.shape
xx = x.reshape(i,j*k).T
y = np.array([[[31,32,33,34,5],
[35,36,37,38,5],
[5,6,7,8,5]],
[[41,42,43,44,5],
[45,46,47,48,5],
[5,6,7,8,5]]])
yy = y.reshape(i,j*k).T
results = cdist(xx,yy,'euclidean')
print results
However, above results produces too many unwanted results. How can I limit it for my required results only.
I want to calculate distance between [1,11] and [31,41]; [2,22] and [32,42],...and so on.
In this method, we first initialize two numpy arrays. Then, we take the difference of the two arrays, compute the dot product of the result, and transpose of the result. Then we take the square root of the answer. This is another way to implement Euclidean distance.
In a two-dimensional space, the Manhattan distance between two points (x1, y1) and (x2, y2) would be calculated as: distance = |x2 - x1| + |y2 - y1| . By its nature, the Manhattan distance will always be equal to or larger than the straight-line distance.
If you just want the distances between each pair of points, then you don't need to calculate a full distance matrix.
Instead, calculate it directly:
import numpy as np
x = np.array([[[1,2,3,4,5],
[5,6,7,8,5],
[5,6,7,8,5]],
[[11,22,23,24,5],
[25,26,27,28,5],
[5,6,7,8,5]]])
y = np.array([[[31,32,33,34,5],
[35,36,37,38,5],
[5,6,7,8,5]],
[[41,42,43,44,5],
[45,46,47,48,5],
[5,6,7,8,5]]])
xx = x.reshape(2, -1)
yy = y.reshape(2, -1)
dist = np.hypot(*(xx - yy))
print dist
To explain a bit more about what's going on, first we reshape the arrays such that they have a 2xN shape (-1
is a placeholder that tells numpy to calculate the correct size along that axis automatically):
In [2]: x.reshape(2, -1)
Out[2]:
array([[ 1, 2, 3, 4, 5, 5, 6, 7, 8, 5, 5, 6, 7, 8, 5],
[11, 22, 23, 24, 5, 25, 26, 27, 28, 5, 5, 6, 7, 8, 5]])
Therefore, when we subtract xx
and yy
, we'll get a 2xN array:
In [3]: xx - yy
Out[3]:
array([[-30, -30, -30, -30, 0, -30, -30, -30, -30, 0, 0, 0, 0,
0, 0],
[-30, -20, -20, -20, 0, -20, -20, -20, -20, 0, 0, 0, 0,
0, 0]])
We can then unpack this in to dx
and dy
components:
In [4]: dx, dy = xx - yy
In [5]: dx
Out[5]:
array([-30, -30, -30, -30, 0, -30, -30, -30, -30, 0, 0, 0, 0,
0, 0])
In [6]: dy
Out[6]:
array([-30, -20, -20, -20, 0, -20, -20, -20, -20, 0, 0, 0, 0,
0, 0])
And calculate the distance (np.hypot
is equivalent to np.sqrt(dx**2 + dy**2)
):
In [7]: np.hypot(dx, dy)
Out[7]:
array([ 42.42640687, 36.05551275, 36.05551275, 36.05551275,
0. , 36.05551275, 36.05551275, 36.05551275,
36.05551275, 0. , 0. , 0. ,
0. , 0. , 0. ])
Or we can have the unpacking done automatically and do it all in one step:
In [8]: np.hypot(*(xx - yy))
Out[8]:
array([ 42.42640687, 36.05551275, 36.05551275, 36.05551275,
0. , 36.05551275, 36.05551275, 36.05551275,
36.05551275, 0. , 0. , 0. ,
0. , 0. , 0. ])
If you want to calculate other types of distances, just change np.hypot
to the function you'd like to use. For example, for Manhattan/city-block distances:
In [9]: dist = np.sum(np.abs(xx - yy), axis=0)
In [10]: dist
Out[10]: array([60, 50, 50, 50, 0, 50, 50, 50, 50, 0, 0, 0, 0, 0, 0])
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With