Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Generate 'K' Nearest Neighbours to a datapoint

I need to generate K nearest neighbours given a datapoint. I read up the sklearn.neighbours module of sklearn but it generates neighbours between two sets of data. What I want is probably a list of 100 datapoints closest to the datapoint passed.

Any KNN algorithm shall anyways be finding these K datapoints under the hood. Is there any way these K points could be returned as output?

Here is my sample notebook.

like image 678
user248884 Avatar asked Dec 21 '18 13:12

user248884


People also ask

What is KNN algorithm example?

With the help of KNN algorithms, we can classify a potential voter into various classes like “Will Vote”, “Will not Vote”, “Will Vote to Party 'Congress', “Will Vote to Party 'BJP'. Other areas in which KNN algorithm can be used are Speech Recognition, Handwriting Detection, Image Recognition and Video Recognition.

How do you calculate KNN from K?

In KNN, finding the value of k is not easy. A small value of k means that noise will have a higher influence on the result and a large value make it computationally expensive. Data scientists usually choose as an odd number if the number of classes is 2 and another simple approach to select k is set k=sqrt(n).

How do I choose the number of neighbors in KNN?

In KNN, K is the number of nearest neighbors. The number of neighbors is the core deciding factor. K is generally an odd number if the number of classes is 2. When K=1, then the algorithm is known as the nearest neighbor algorithm.


2 Answers

from sklearn.neighbors import NearestNeighbors 

This can give you the index of the k nearest neighbors in your dataset. use kneighbors, first value is the distance and second value is the index of the neighbors. From documentation:

>>> samples = [[0., 0., 0.], [0., .5, 0.], [1., 1., .5]]
>>> from sklearn.neighbors import NearestNeighbors
>>> neigh = NearestNeighbors(n_neighbors=1)
>>> neigh.fit(samples) 
NearestNeighbors(algorithm='auto', leaf_size=30, ...)
>>> print(neigh.kneighbors([[1., 1., 1.]])) 
(array([[0.5]]), array([[2]]))
like image 123
Venkatachalam Avatar answered Sep 21 '22 11:09

Venkatachalam


You don't need to look under the hood.

Use the kd-tree for nearest-neighbor lookup. Once, you have the index ready, you would query it for the k-NNs.

Ref example:

>>> from scipy import spatial
>>> x, y = np.mgrid[0:5, 2:8]
>>> tree = spatial.KDTree(list(zip(x.ravel(), y.ravel())))
>>> pts = np.array([[0, 0], [2.1, 2.9]])
>>> tree.query(pts)
(array([ 2.        ,  0.14142136]), array([ 0, 13]))
>>> tree.query(pts[0])
(2.0, 0)
like image 38
gsamaras Avatar answered Sep 22 '22 11:09

gsamaras