Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Efficient distance calculation between N points and a reference in numpy/scipy

Tags:

I just started using scipy/numpy. I have an 100000*3 array, each row is a coordinate, and a 1*3 center point. I want to calculate the distance for each row in the array to the center and store them in another array. What is the most efficient way to do it?

like image 606
D. Huang Avatar asked Jun 21 '11 18:06

D. Huang


People also ask

How can I find the distance between two points in Numpy?

Python has its built-in method, in the math module, that calculates the distance between 2 points in 3d space. However, this only works with Python 3.8 or later. math. dist() takes in two parameters, which are the two points, and returns the Euclidean distance between those points.

Which function from Scipy is used to calculate the distance between all pairs of points in a given set?

Distance functions between two boolean vectors (representing sets) u and v . As in the case of numerical vectors, pdist is more efficient for computing the distances between all pairs.

How do you find the distance between two points in Python?

The math. dist() method returns the Euclidean distance between two points (p and q), where p and q are the coordinates of that point. Note: The two points (p and q) must be of the same dimensions.

How does Cdist work in Python?

cdist(array, axis=0) function calculates the distance between each pair of the two collections of inputs. Parameters : array: Input array or object having the elements to calculate the distance between each pair of the two collections of inputs.


1 Answers

I would take a look at scipy.spatial.distance.cdist:

http://docs.scipy.org/doc/scipy/reference/generated/scipy.spatial.distance.cdist.html

import numpy as np import scipy  a = np.random.normal(size=(10,3)) b = np.random.normal(size=(1,3))  dist = scipy.spatial.distance.cdist(a,b) # pick the appropriate distance metric  

dist for the default distant metric is equivalent to:

np.sqrt(np.sum((a-b)**2,axis=1))   

although cdist is much more efficient for large arrays (on my machine for your size problem, cdist is faster by a factor of ~35x).

like image 187
JoshAdel Avatar answered Oct 26 '22 23:10

JoshAdel