Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Find all nearest neighbors within a specific distance

I have a large list of x and y coordinates, stored in an numpy array.

Coordinates = [[ 60037633 289492298]
 [ 60782468 289401668]
 [ 60057234 289419794]]
...
...

What I want is to find all nearest neighbors within a specific distance (lets say 3 meters) and store the result so that I later can do some further analysis on the result.

For most packages I found it is necessary to decided how many NNs should be found but I just want all within the set distance.

How can I achieve something like that and what is the fastest and best way to achieve something like that for a large dataset (some million points)?

like image 945
Kitumijasi Avatar asked Sep 06 '15 14:09

Kitumijasi


People also ask

How do you find the nearest neighbors distance?

The average nearest neighbor ratio is calculated as the observed average distance divided by the expected average distance (with expected average distance being based on a hypothetical random distribution with the same number of features covering the same total area).

What is near Neighbour analysis?

Nearest Neighbour Analysis measures the spread or distribution of something over a geographical space. It provides a numerical value that describes the extent to which a set of points are clustered or uniformly spaced.

What is Sklearn neighbors used for?

sklearn. neighbors provides functionality for unsupervised and supervised neighbors-based learning methods. Unsupervised nearest neighbors is the foundation of many other learning methods, notably manifold learning and spectral clustering.


1 Answers

You could use a scipy.spatial.cKDTree:

import numpy as np
import scipy.spatial as spatial
points = np.array([(1, 2), (3, 4), (4, 5)])
point_tree = spatial.cKDTree(points)
# This finds the index of all points within distance 1 of [1.5,2.5].
print(point_tree.query_ball_point([1.5, 2.5], 1))
# [0]

# This gives the point in the KDTree which is within 1 unit of [1.5, 2.5]
print(point_tree.data[point_tree.query_ball_point([1.5, 2.5], 1)])
# [[1 2]]

# More than one point is within 3 units of [1.5, 1.6].
print(point_tree.data[point_tree.query_ball_point([1.5, 1.6], 3)])
# [[1 2]
#  [3 4]]

Here is an example showing how you can find all the nearest neighbors to an array of points, with one call to point_tree.query_ball_point:

import numpy as np
import scipy.spatial as spatial
import matplotlib.pyplot as plt
np.random.seed(2015)

centers = [(1, 2), (3, 4), (4, 5)]
points = np.concatenate([pt+np.random.random((10, 2))*0.5 
                         for pt in centers])
point_tree = spatial.cKDTree(points)

cmap = plt.get_cmap('copper')
colors = cmap(np.linspace(0, 1, len(centers)))
for center, group, color  in zip(centers, point_tree.query_ball_point(centers, 0.5), colors):
   cluster = point_tree.data[group]
   x, y = cluster[:, 0], cluster[:, 1]
   plt.scatter(x, y, c=color, s=200)

plt.show()

enter image description here

like image 118
unutbu Avatar answered Oct 13 '22 04:10

unutbu