Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Fast fuse of close points in a numpy-2d (vectorized)

I have a question similar to the question asked here: simple way of fusing a few close points. I want to replace points that are located close to each other with the average of their coordinates. The closeness in cells is specified by the user (I am talking about euclidean distance).

In my case I have a lot of points (about 1-million). This method is working, but is time consuming as it uses a double for loop.

Is there a faster way to detect and fuse close points in a numpy 2d array?


To be complete I added an example:

points=array([[  382.49056159,   640.1731949 ],
   [  496.44669161,   655.8583119 ],
   [ 1255.64762859,   672.99699399],
   [ 1070.16520917,   688.33538171],
   [  318.89390168,   718.05989421],
   [  259.7106383 ,   822.2       ],
   [  141.52574427,    28.68594436],
   [ 1061.13573287,    28.7094536 ],
   [  820.57417943,    84.27702407],
   [  806.71416007,   108.50307828]])

A scatterplot of the points is visible below. The red circle indicates the points located close to each other (in this case a distance of 27.91 between the last two points in the array). So if the user would specify a minimum distance of 30 these points should be fused.

enter image description here

In the output of the fuse function the last to points are fused. This will look like:

#output
array([[  382.49056159,   640.1731949 ],
   [  496.44669161,   655.8583119 ],
   [ 1255.64762859,   672.99699399],
   [ 1070.16520917,   688.33538171],
   [  318.89390168,   718.05989421],
   [  259.7106383 ,   822.2       ],
   [  141.52574427,    28.68594436],
   [ 1061.13573287,    28.7094536 ],
   [  813.64416975,    96.390051175]])
like image 292
Wilmar van Ommeren Avatar asked May 02 '16 14:05

Wilmar van Ommeren


2 Answers

If you have a large number of points then it may be faster to build a k-D tree using scipy.spatial.KDTree, then query it for pairs of points that are closer than some threshold:

import numpy as np
from scipy.spatial import KDTree

tree = KDTree(points)
rows_to_fuse = tree.query_pairs(r=30)    

print(repr(rows_to_fuse))
# {(8, 9)}

print(repr(points[list(rows_to_fuse)]))
# array([[ 820.57417943,   84.27702407],
#        [ 806.71416007,  108.50307828]])

The major advantage of this approach is that you don't need to compute the distance between every pair of points in your dataset.

like image 98
ali_m Avatar answered Nov 06 '22 00:11

ali_m


You can use scipy's distance functions such as pdist in order to quickly find which points should be merged:

import numpy as np
from scipy.spatial.distance import pdist, squareform

d = squareform(pdist(a))
d = np.ma.array(d, mask=np.isclose(d, 0))
a[d.min(axis=1) < 30]
#array([[ 820.57417943,   84.27702407],
#       [ 806.71416007,  108.50307828]])

NOTE

For large samples this method can cause memory errors since it is storing a full matrix containing the relative distances.

like image 38
Saullo G. P. Castro Avatar answered Nov 06 '22 00:11

Saullo G. P. Castro