Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Finding euclidean difference between coordinates in numpy

I am trying to calculate the difference between coordinates using numpy using following code :

X = np.random.random((1000, 3))
# using broadcasting to calculate to find pairwise diffrence
diff = X.reshape(1000, 1, 3) - X
D = (diff**2).sum(2)

Can somebody explain me what the above two lines are doing? I don't understand how this calculates the euclidean distance for the coordinates.

like image 713
Mudits Avatar asked Jan 18 '26 10:01

Mudits


1 Answers

The first one calculates a random 1000×3 matrix, all values are between 0 and 1, so the rows represent points in a 3D unit cube:

np.random.random((1000, 3))

Next we use reshape, to construct an 1000×1×3 matrix, with X.reshape(1000, 1, 3).

Now when we subtract that reshaped matrix something happens that is very popular in numpy: broadcasting. So we have a 1000×1×3 matrix, and a 1000×1×3 matrix. That means that we will "blow up" the first matrix by hypothetically repeating the second dimension 1000 times. So a matrix like:

[[[x1, y1, z1]],
 [[x2, y2, z2]],
 ...
 [[xn, yn, zn]]]

Will now be transformed into:

[[[x1, y1, z1], [x1, y1, z1], ..., [x1, y1, z1]],
 [[x2, y2, z2], [x2, y2, z2], ..., [x2, y2, z2]],
  ...         , ...              , ...          ,
 [[xn, yn, zn], [xn, yn, zn], ..., [xn, yn, zn]]]

Where each triple is repeated n times per row.

We blow up the second matrix as well, such that:

[[x1, y1, z1],
 [x2, y2, z2],
 ...
 [xn, yn, zn]]

(mind that we have one pair of square brackets less per row), will blow up to:

[[x1, y1, z1], [x2, y2, z2], ..., [xn, yn, zn],
 [x1, y1, z1], [x2, y2, z2], ..., [xn, yn, zn],
 ...         , ...,          ..., ...         ,
 [x1, y1, z1], [x2, y2, z2], ..., [xn, yn, zn]]

If we now subtract the two matrices, we get:

[[[x1-x1, y1-y1, z1-z1], [x1-x2, y1-y2, z1-z2], ..., [x1-xn, y1-yn, z1-zn]],
 [[x2-x1, y2-y1, z2-z1], [x2-x2, y2-y2, z2-z2], ..., [x2-xn, y2-yn, z2-zn]],
  ...         , ...              , ...          ,
 [[xn-x1, yn-y1, zn-z1], [xn-x2, yn-y2, zn-z2], ..., [xn-xn, yn-yn, zn-zn]]]

So we constructed a 1000×1000×3 matrix, where the i,j-th element is a 3-tuple containing the difference of the coordinates of the point specified at row i and the coordinate of the point specified at row j. We store that result in diff.

Next we use diff**2 to caculate the elementwise square, so now:

(diff**2)[i, j] == array([(xi-xj)**2, (yi-yj)**2, (zi-zj)**2])

Finally we call .sum(2) on that matrix (this means axis=2). This means that we for each element of (diff**2) will sum up the components, so then:

((diff**2).sum())[i, j] == (xi-xj)**2 + (yi-yj)**2 + (zi-zj)**2

So we calculate a 1000×1000 matrix where the [i, j]-th element is the square of the Euclidean distance. Note that we do not calculate the Euclidean distance, but its square. We can however calculate it, by applying np.sqrt(..) on the result, so:

D = np.sqrt((diff**2).sum(2))  # Euclidean distance
like image 56
Willem Van Onsem Avatar answered Jan 20 '26 01:01

Willem Van Onsem



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!