searching for k nearest points

Tags:

knn

I have a large set of features that looks like this:

id1 28273 20866 29961 27190 31790 19714 8643 14482 5384 ....  upto 1000
id2 12343 45634 29961 27130 33790 14714 7633 15483 4484 ....  
id3 ..... ..... ..... ..... ..... ..... .... ..... .... .... .   .   .
...
id200000 .... .... ... ..  .  .  .  .

I want to compute for each id euclidean distance and sort them to find the 5-nearest points. Because my dataset is very large. what is the best way to do it.

925

asked Sep 11 '12 12:09

2 Answers

scikit-learn has nearest neighbor search. Example:

Load your data into a NumPy array.

>>> import numpy as np
>>> X = np.array([[28273, 20866, 29961, 27190, 31790, 19714, 8643, 14482, 5384, ...],
                  [12343, 45634, 29961, 27130, 33790, 14714, 7633, 15483, 4484, ...], 
                  ...
                  ])

(Just two points shown.)

Fit a NearestNeighbors object.

>>> from sklearn.neighbors import NearestNeighbors
>>> knn = NearestNeighbors(n_neighbors=5)
>>> knn.fit(X)
NearestNeighbors(algorithm='auto', leaf_size=30, n_neighbors=5, p=2,
         radius=1.0, warn_on_equidistant=True)

p=2 means Euclidean (L2) distance. p=1 would mean Manhattan (L1) distance.

Perform queries. To get the neighbors of X[0], your first data point:
```
>>> knn.kneighbors(X[0], return_distance=False)
array([[0, 1]])
```
So, the nearest neighbors of X[0] are X[0] itself and X[1] (of course).

Make sure you set n_neighbors=6 because every point in your set is going to be its own nearest neighbor.

Disclaimer: I'm involved in scikit-learn development, so this is not unbiased advice.

answered Sep 22 '22 17:09

From your question it is not entirely clear what the specifics of your problem are. I understood so far, that you need to calculate euclidean distances between a large amount of data points. The fastest solution in Python probably makes use of the scipy.spatial.distance module. Please have a look at

http://docs.scipy.org/doc/scipy/reference/generated/scipy.spatial.distance.pdist.html

and

http://docs.scipy.org/doc/scipy/reference/generated/scipy.spatial.distance.cdist.html

You will have to make yourself familiar with the numpy data types, develop input data for one of these functions and further evaluate the resulting data. You'll probably end up trying to get some maximum/minimum N values of an array, at which point How to get indices of N maximum values in a numpy array? could help.

answered Sep 22 '22 17:09

Dr. Jan-Philip Gehrcke

Related questions
                            
                                Bash: Variable in single quote
                            
                                Why does Django create Postgres timestamp columns with time zones?
                            
                                I Don't Understand This Use of Recursion
                            
                                SGE script: print to file during execution (not just at the end)?
                            
                                string to list conversion in python
                            
                                Using "readlines()" twice in a row [duplicate]
                            
                                Divide the number into random number of random elements?
                            
                                Get IP Mask from IP Address and Mask Length in Python
                            
                                Configuring gunicorn for Django on Heroku
                            
                                Check if one of all variables is empty
                            
                                Order a list by all item's digits in Python
                            
                                Numpy: ImportError: cannot import name TestCase
                            
                                How do I alias a command line command? (Mac)
                            
                                Surface Curvature Matlab equivalent in Python
                            
                                Nothing happens when I do: python manage.py command
                            
                                Add items to a dictionary of lists
                            
                                Python: why does my list change when I'm not actually changing it?
                            
                                Python while loop inconstancy
                            
                                How to make sense of this result?
                            
                                Handling non-standard American English Characters and Symbols in a CSV, using Python

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

searching for k nearest points

Tags:

python

knn

Rafaelopasa

People also ask

2 Answers

Fred Foo

Dr. Jan-Philip Gehrcke

Recent Activity

Donate For Us