Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Find K nearest neighbors, starting from a distance matrix

I'm looking for a well-optimized function that accepts an n X n distance matrix and returns an n X k matrix with the indices of the k nearest neighbors of the ith datapoint in the ith row.

I find a gazillion different R packages that let you do KNN, but they all seem to include the distance computations along with the sorting algorithm within the same function. In particular, for most routines the main argument is the original data matrix, not a distance matrix. In my case, I'm using a nonstandard distance on mixed variable types, so I need to separate the sorting problem from the distance computations.

This is not exactly a daunting problem -- I obviously could just use the order function inside a loop to get what I want (see my solution below), but this is far from optimal. For example, the sort function with partial = 1:k when k is small (less than 11) goes much faster, but unfortunately returns only sorted values rather than the desired indices.

like image 885
zkurtz Avatar asked May 03 '14 20:05

zkurtz


1 Answers

Try to use FastKNN CRAN package (although it is not well documented). It offers k.nearest.neighbors function where an arbitrary distance matrix can be given. Below you have an example that computes the matrix you need.

# arbitrary data
train <- matrix(sample(c("a","b","c"),12,replace=TRUE), ncol=2) # n x 2
n = dim(train)[1]
distMatrix <- matrix(runif(n^2,0,1),ncol=n) # n x n

# matrix of neighbours
k=3
nn = matrix(0,n,k) # n x k
for (i in 1:n)
   nn[i,] = k.nearest.neighbors(i, distMatrix, k = k)

Notice: You can always check Cran packages list for Ctrl+F='knn' related functions: https://cran.r-project.org/web/packages/available_packages_by_name.html

like image 136
hanna Avatar answered Sep 29 '22 05:09

hanna