Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Finding k-nearest neighbor for only one point (not an entire matrix) using R

Tags:

r

knn

Is there a package or a simple way to serach k-nearest neighbor (specially with kd tree) for one point using R? All the packages who provide this function (example RANN or FNN...) compute the knn for all the points in a matrix, I need to do it for only one point.

For example I have a matrix with 10 points "A" to "E" and I want to find for "A" the 2 nearest neighbors between the 4 other points ("B" to "E") without doing the same calculation for all the rows in the dataset (without computing knn for "B", "C", "D", "E")

I hope my question is clear, my english is not good.

Thank you for help,

like image 534
Riadh Avatar asked Dec 24 '12 19:12

Riadh


People also ask

How do you find the value of k in KNN in R?

So the value of k indicates the number of training samples that are needed to classify the test sample. Coming to your question, the value of k is non-parametric and a general rule of thumb in choosing the value of k is k = sqrt(N)/2, where N stands for the number of samples in your training dataset.

How do you implement K to the nearest neighbor in R?

KNN Algorithm Pseudocode: Calculate D(x, xi), where 'i' =1, 2, ….., n and 'D' is the Euclidean measure between the data points. The calculated Euclidean distances must be arranged in ascending order. Initialize k and take the first k distances from the sorted list.

What happens when K is 1 in K nearest neighbor?

An object is classified by a plurality vote of its neighbors, with the object being assigned to the class most common among its k nearest neighbors (k is a positive integer, typically small). If k = 1, then the object is simply assigned to the class of that single nearest neighbor.

How do I choose the number of neighbors in KNN?

In KNN, K is the number of nearest neighbors. The number of neighbors is the core deciding factor. K is generally an odd number if the number of classes is 2. When K=1, then the algorithm is known as the nearest neighbor algorithm.


1 Answers

If I understand correctly, you can do this with the FNN package:

> library(FNN)
> X <- matrix(runif(100), 5, 5)
> X
          [,1]      [,2]      [,3]      [,4]      [,5]
[1,] 0.7475301 0.6725876 0.2511358 0.5048512 0.1196027
[2,] 0.5777907 0.6337206 0.8334608 0.5067914 0.6410024
[3,] 0.5488786 0.9613076 0.2217271 0.6906149 0.7396482
[4,] 0.8230380 0.8596784 0.6348114 0.6211107 0.3089131
[5,] 0.6531433 0.8682462 0.2555402 0.2443061 0.5292509
> knnx.dist(X[-1,], X[1, , drop=FALSE], k=2)
          [,1]     [,2]
[1,] 0.4870996 0.531889
> knnx.index(X[-1,], X[1, , drop=FALSE], k=2)
     [,1] [,2]
[1,]    3    4

Note that the result of knnx.index relates to the matrix passed to the function so that 3, and 4 actually means rows 4 and 5 the original data set.

like image 141
dcarlson Avatar answered Oct 12 '22 11:10

dcarlson