In my project I need to compute euclidian distance beetween each points stored in an array. The entry array is a 2D numpy array with 3 columns which are the coordinates(x,y,z) and each rows define a new point.
I'm usualy working with 5000 - 6000 points in my test cases.
My first algorithm use Cython and my second numpy. I find that my numpy algorithm is faster than cython.
edit: with 6000 points :
numpy 1.76 s / cython 4.36 s
Here's my cython code:
cimport cython
from libc.math cimport sqrt
@cython.boundscheck(False)
@cython.wraparound(False)
cdef void calcul1(double[::1] M,double[::1] R):
cdef int i=0
cdef int max = M.shape[0]
cdef int x,y
cdef int start = 1
for x in range(0,max,3):
for y in range(start,max,3):
R[i]= sqrt((M[y] - M[x])**2 + (M[y+1] - M[x+1])**2 + (M[y+2] - M[x+2])**2)
i+=1
start += 1
M is a memory view of the initial entry array but flatten()
by numpy before the call of the function calcul1()
, R is a memory view of a 1D output array to store all the results.
Here's my Numpy code :
def calcul2(M):
return np.sqrt(((M[:,:,np.newaxis] - M[:,np.newaxis,:])**2).sum(axis=0))
Here M is the initial entry array but transpose()
by numpy before the function call to have coordinates(x,y,z) as rows and points as columns.
Moreover this numpy function is quite convinient because the array it returns is well organise. It's a n by n array with n the number of points and each points has a row and a column. So for example the distance AB is stored at the intersection index of row A and column B.
Here's how I call them (cython function):
cpdef test():
cdef double[::1] Mf
cdef double[::1] out = np.empty(17998000,dtype=np.float64) # (6000² - 6000) / 2
M = np.arange(6000*3,dtype=np.float64).reshape(6000,3) # Example array with 6000 points
Mf = M.flatten() #because my cython algorithm need a 1D array
Mt = M.transpose() # because my numpy algorithm need coordinates as rows
calcul2(Mt)
calcul1(Mf,out)
Am I doing something wrong here ? For my project both are not fast enough.
1: Is there a way to improve my cython code in order to beat numpy's speed ?
2: Is there a way to improve my numpy code to compute even faster ?
3: Or any other solutions, but it must be a python/cython (like parallel computing) ?
Thank you.
The math. dist() method returns the Euclidean distance between two points (p and q), where p and q are the coordinates of that point. Note: The two points (p and q) must be of the same dimensions.
Use dist in a nested loop inside shortestDist to compare each element of the list of points with every element in the list after it. So, basically, find the shortest distance between points in a list. That finds the distance alright between two points.
Distance between two points is the length of the line segment that connects the two points in a plane. The formula to find the distance between the two points is usually given by d=√((x2 – x1)² + (y2 – y1)²). This formula is used to find the distance between any two points on a coordinate plane or x-y plane.
How to Calculate Euclidean Distance in Python? The formula to calculate the distance between two points (x1 1 , y1 1 ) and (x2 2 , y2 2 ) is d = √[(x2 – x1)2 + (y2 – y1)2].
Not sure where you are getting your timings, but you can use scipy.spatial.distance
:
M = np.arange(6000*3, dtype=np.float64).reshape(6000,3)
np_result = calcul2(M)
sp_result = sd.cdist(M.T, M.T) #Scipy usage
np.allclose(np_result, sp_result)
>>> True
Timings:
%timeit calcul2(M)
1000 loops, best of 3: 313 µs per loop
%timeit sd.cdist(M.T, M.T)
10000 loops, best of 3: 86.4 µs per loop
Importantly, its also useful to realize that your output is symmetric:
np.allclose(sp_result, sp_result.T)
>>> True
An alternative is to only compute the upper triangular of this array:
%timeit sd.pdist(M.T)
10000 loops, best of 3: 39.1 µs per loop
Edit: Not sure which index you want to zip, looks like you may be doing it both ways? Zipping the other index for comparison:
%timeit sd.pdist(M)
10 loops, best of 3: 135 ms per loop
Still about 10x faster than your current NumPy implementation.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With