I am new to Python so this question might look trivia. However, I did not find a similar case to mine. I have a matrix of coordinates for 20 nodes. I want to compute the euclidean distance between all pairs of nodes from this set and store them in a pairwise matrix. For example, If I have 20 nodes, I want the end result to be a matrix of (20,20) with values of euclidean distance between each pairs of nodes. I tried to used a for loop to go through each element of the coordinate set and compute euclidean distance as follows: <pre class="prettyprint"><code>ncoord=numpy.matrix('3225 318;2387 989;1228 2335;57 1569;2288 8138;3514 2350;7936 314;9888 4683;6901 1834;7515 8231;709 3701;1321 8881;2290 2350;5687 5034;760 9868;2378 7521;9025 5385;4819 5943;2917 9418;3928 9770') n=20 c=numpy.zeros((n,n)) for i in range(0,n): for j in range(i+1,n): c[i][j]=math.sqrt((ncoord[i][0]-ncoord[j][0])**2+(ncoord[i][1]-ncoord[j][1])**2) </code></pre> How ever, I am getting an error of "input must be a square array ". I wonder if anybody knows what is happening here. Thanks

<pre class="prettyprint"><code>for i in range(0, n): for j in range(i+1, n): c[i, j] = math.sqrt((ncoord[i, 0] - ncoord[j, 0])**2 + (ncoord[i, 1] - ncoord[j, 1])**2) </code></pre> Note: <code>ncoord[i, j]</code> is not the same as <code>ncoord[i][j]</code> for a Numpy matrix. This appears to be the source of confusion. If <code>ncoord</code> is a Numpy array then they will give the same result. For a Numpy matrix, <code>ncoord[i]</code> returns the ith row of <code>ncoord</code>, which itself is a Numpy matrix object with shape 1 x 2 in your case. Therefore, <code>ncoord[i][j]</code> actually means: take the ith row of <code>ncoord</code> and take the jth row of that 1 x 2 matrix. This is where your indexing problems comes about when <code>j</code> > 0. Regarding your comments on assigning to <code>c[i][j]</code> "working", it shouldn't. At least on my build of Numpy 1.9.1 it shouldn't work if your indices <code>i</code> and <code>j</code> iterates up to <code>n</code>. As an aside, remember to add the transpose of the matrix <code>c</code> to itself. It is recommended to use Numpy arrays instead of matrix. See this post. If your coordinates are stored as a Numpy array, then pairwise distance can be computed as: <pre class="prettyprint"><code>from scipy.spatial.distance import pdist pairwise_distances = pdist(ncoord, metric="euclidean", p=2) </code></pre> or simply <pre class="prettyprint"><code>pairwise_distances = pdist(ncoord) </code></pre> since the default metric is "euclidean", and default "p" is 2. In a comment below I mistakenly mentioned that the result of pdist is a n x n matrix. To get a n x n matrix, you will need to do the following: <pre class="prettyprint"><code>from scipy.spatial.distance import pdist, squareform pairwise_distances = squareform(pdist(ncoord)) </code></pre> or <pre class="prettyprint"><code>from scipy.spatial.distance import cdist pairwise_distances = cdist(ncoord, ncoord) </code></pre>

Computing Euclidean distance for numpy in python

Tags:

python

numpy

I am new to Python so this question might look trivia. However, I did not find a similar case to mine. I have a matrix of coordinates for 20 nodes. I want to compute the euclidean distance between all pairs of nodes from this set and store them in a pairwise matrix. For example, If I have 20 nodes, I want the end result to be a matrix of (20,20) with values of euclidean distance between each pairs of nodes. I tried to used a for loop to go through each element of the coordinate set and compute euclidean distance as follows:

ncoord=numpy.matrix('3225   318;2387    989;1228    2335;57      1569;2288  8138;3514   2350;7936   314;9888    4683;6901   1834;7515   8231;709   3701;1321    8881;2290   2350;5687   5034;760    9868;2378   7521;9025   5385;4819   5943;2917   9418;3928   9770')
n=20 
c=numpy.zeros((n,n))
for i in range(0,n):
    for j in range(i+1,n):
        c[i][j]=math.sqrt((ncoord[i][0]-ncoord[j][0])**2+(ncoord[i][1]-ncoord[j][1])**2)

How ever, I am getting an error of "input must be a square array ". I wonder if anybody knows what is happening here. Thanks

403

asked Feb 24 '15 02:02

Fairy

2 Answers

There are much, much faster alternatives to using nested for loops for this. I'll show you two different approaches - the first will be a more general method that will introduce you to broadcasting and vectorization, and the second uses a more convenient scipy library function.

1. The general way, using broadcasting & vectorization

One of the first things I'd suggest doing is switching to using np.array rather than np.matrix. Arrays are preferred for a number of reasons, most importantly because they can have >2 dimensions, and they make element-wise multiplication much less awkward.

import numpy as np

ncoord = np.array(ncoord)

With an array, we can eliminate the nested for loops by inserting a new singleton dimension and broadcasting the subtraction over it:

# indexing with None (or np.newaxis) inserts a new dimension of size 1
print(ncoord[:, :, None].shape)
# (20, 2, 1)

# by making the 'inner' dimensions equal to 1, i.e. (20, 2, 1) - (1, 2, 20),
# the subtraction is 'broadcast' over every pair of rows in ncoord
xydiff = ncoord[:, :, None] - ncoord[:, :, None].T

print(xydiff.shape)
# (20, 2, 20)

This is equivalent to looping over every pair of rows using nested for loops, but much, much faster!

xydiff2 = np.zeros((20, 2, 20), dtype=xydiff.dtype)
for ii in range(20):
    for jj in range(20):
        for kk in range(2):
            xydiff[ii, kk, jj] = ncoords[ii, kk] - ncoords[jj, kk]

# check that these give the same result
print(np.all(xydiff == xydiff2))
# True

The rest we can also do using vectorized operations:

# we square the differences and sum over the 'middle' axis, equivalent to
# computing (x_i - x_j) ** 2 + (y_i - y_j) ** 2
ssdiff = (xydiff * xydiff).sum(1)

# finally we take the square root
D = np.sqrt(ssdiff)

The whole thing could be done in one line like this:

D = np.sqrt(((ncoord[:, :, None] - ncoord[:, :, None].T) ** 2).sum(1))

2. The lazy way, using `pdist`

It turns out that there's already a fast and convenient function for computing all pairwise distances: scipy.spatial.distance.pdist.

from scipy.spatial.distance import pdist, squareform

d = pdist(ncoord)

# pdist just returns the upper triangle of the pairwise distance matrix. to get
# the whole (20, 20) array we can use squareform:

print(d.shape)
# (190,)

D2 = squareform(d)
print(D2.shape)
# (20, 20)

# check that the two methods are equivalent
print np.all(D == D2)
# True

164

answered Sep 17 '22 23:09

ali_m

for i in range(0, n):
    for j in range(i+1, n):
        c[i, j] = math.sqrt((ncoord[i, 0] - ncoord[j, 0])**2 
        + (ncoord[i, 1] - ncoord[j, 1])**2)

Note: ncoord[i, j] is not the same as ncoord[i][j] for a Numpy matrix. This appears to be the source of confusion. If ncoord is a Numpy array then they will give the same result.

For a Numpy matrix, ncoord[i] returns the ith row of ncoord, which itself is a Numpy matrix object with shape 1 x 2 in your case. Therefore, ncoord[i][j] actually means: take the ith row of ncoord and take the jth row of that 1 x 2 matrix. This is where your indexing problems comes about when j > 0.

Regarding your comments on assigning to c[i][j] "working", it shouldn't. At least on my build of Numpy 1.9.1 it shouldn't work if your indices i and j iterates up to n.

As an aside, remember to add the transpose of the matrix c to itself.

It is recommended to use Numpy arrays instead of matrix. See this post.

If your coordinates are stored as a Numpy array, then pairwise distance can be computed as:

from scipy.spatial.distance import pdist

pairwise_distances = pdist(ncoord, metric="euclidean", p=2)

or simply

pairwise_distances = pdist(ncoord)

since the default metric is "euclidean", and default "p" is 2.

In a comment below I mistakenly mentioned that the result of pdist is a n x n matrix. To get a n x n matrix, you will need to do the following:

from scipy.spatial.distance import pdist, squareform

pairwise_distances = squareform(pdist(ncoord))

from scipy.spatial.distance import cdist

pairwise_distances = cdist(ncoord, ncoord)

answered Sep 19 '22 23:09

lightalchemist

Related questions
                            
                                How can I maximize a specific window with Python?
                            
                                Download files from a list if not already downloaded
                            
                                Condensed matrix function to find pairs
                            
                                python head, tail and backward read by lines of a text file
                            
                                Very simple concurrent programming in Python
                            
                                simple python oAuth 1.0 example with consumer key and secret
                            
                                Python nested loop with generators does not work (in some cases)?
                            
                                Django runserver in command with custom settings
                            
                                compare if an element exists in two lists
                            
                                Find the smallest number in a python list and print the position
                            
                                How to ignore an output of a multi-output function in Python? [duplicate]
                            
                                Python os.chdir is modifying the passed directory name
                            
                                How to display a byte array as hex values
                            
                                Catch all path in flask app
                            
                                How to process GET Query String with Flask
                            
                                list comprehension joining every two elements together in a list
                            
                                How to disable selection highlighting in a QTableWidget
                            
                                How do I add attributes to SubElement in ElementTree (Python)
                            
                                HoughCircles circle detection using opencv and python-
                            
                                Python MatplotLib plot x-axis with first x-axis value labeled as 1 (instead of 0)

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Computing Euclidean distance for numpy in python

Tags:

python

numpy

Fairy

People also ask

2 Answers

1. The general way, using broadcasting & vectorization

2. The lazy way, using `pdist`

ali_m

lightalchemist

Recent Activity

Donate For Us

Computing Euclidean distance for numpy in python

Tags:

python

numpy

Fairy

People also ask

2 Answers

1. The general way, using broadcasting & vectorization

2. The lazy way, using pdist

ali_m

lightalchemist

Related questions

Recent Activity

Donate For Us

2. The lazy way, using `pdist`