Finding euclidean difference between coordinates in numpy

Question

I am trying to calculate the difference between coordinates using numpy using following code :

X = np.random.random((1000, 3))
# using broadcasting to calculate to find pairwise diffrence
diff = X.reshape(1000, 1, 3) - X
D = (diff**2).sum(2)

Can somebody explain me what the above two lines are doing? I don't understand how this calculates the euclidean distance for the coordinates.

Willem Van Onsem · Accepted Answer

The first one calculates a random 1000×3 matrix, all values are between 0 and 1, so the rows represent points in a 3D unit cube:

np.random.random((1000, 3))

Next we use reshape, to construct an 1000×1×3 matrix, with X.reshape(1000, 1, 3).

Now when we subtract that reshaped matrix something happens that is very popular in numpy: broadcasting. So we have a 1000×1×3 matrix, and a 1000×1×3 matrix. That means that we will "blow up" the first matrix by hypothetically repeating the second dimension 1000 times. So a matrix like:

[[[x1, y1, z1]],
 [[x2, y2, z2]],
 ...
 [[xn, yn, zn]]]

Will now be transformed into:

[[[x1, y1, z1], [x1, y1, z1], ..., [x1, y1, z1]],
 [[x2, y2, z2], [x2, y2, z2], ..., [x2, y2, z2]],
  ...         , ...              , ...          ,
 [[xn, yn, zn], [xn, yn, zn], ..., [xn, yn, zn]]]

Where each triple is repeated n times per row.

We blow up the second matrix as well, such that:

[[x1, y1, z1],
 [x2, y2, z2],
 ...
 [xn, yn, zn]]

(mind that we have one pair of square brackets less per row), will blow up to:

[[x1, y1, z1], [x2, y2, z2], ..., [xn, yn, zn],
 [x1, y1, z1], [x2, y2, z2], ..., [xn, yn, zn],
 ...         , ...,          ..., ...         ,
 [x1, y1, z1], [x2, y2, z2], ..., [xn, yn, zn]]

If we now subtract the two matrices, we get:

[[[x1-x1, y1-y1, z1-z1], [x1-x2, y1-y2, z1-z2], ..., [x1-xn, y1-yn, z1-zn]],
 [[x2-x1, y2-y1, z2-z1], [x2-x2, y2-y2, z2-z2], ..., [x2-xn, y2-yn, z2-zn]],
  ...         , ...              , ...          ,
 [[xn-x1, yn-y1, zn-z1], [xn-x2, yn-y2, zn-z2], ..., [xn-xn, yn-yn, zn-zn]]]

So we constructed a 1000×1000×3 matrix, where the i,j-th element is a 3-tuple containing the difference of the coordinates of the point specified at row i and the coordinate of the point specified at row j. We store that result in diff.

Next we use diff**2 to caculate the elementwise square, so now:

(diff**2)[i, j] == array([(xi-xj)**2, (yi-yj)**2, (zi-zj)**2])

Finally we call .sum(2) on that matrix (this means axis=2). This means that we for each element of (diff**2) will sum up the components, so then:

((diff**2).sum())[i, j] == (xi-xj)**2 + (yi-yj)**2 + (zi-zj)**2

So we calculate a 1000×1000 matrix where the [i, j]-th element is the square of the Euclidean distance. Note that we do not calculate the Euclidean distance, but its square. We can however calculate it, by applying np.sqrt(..) on the result, so:

D = np.sqrt((diff**2).sum(2))  # Euclidean distance

Finding euclidean difference between coordinates in numpy

Tags:

python

matrix

numpy

euclidean-distance

Mudits

1 Answers

Willem Van Onsem

Recent Activity

Donate For Us

Finding euclidean difference between coordinates in numpy

Tags:

python

matrix

numpy

euclidean-distance

Mudits

1 Answers

Willem Van Onsem

Related questions

Recent Activity

Donate For Us