Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Saving and incrementally updating nearest-neighbor model in R

Tags:

r

There are several nearest neighbor R packages (e.g., FNN, RANN, yaImpute) but none of them seem to allow saving off the NN data structure (cover tree, KD tree etc.) so that the nearest neighbors of new queries can be calculated without reconstructing the whole tree. Are there any such functions in R?

I am looking for a function that returns a data structure that I can update incrementally as new data arrives to perform approximate K nearest neighbor search.

like image 447
Innuo Avatar asked Aug 30 '12 17:08

Innuo


People also ask

What are some issues with nearest neighbor methods?

A major problem with the simple nearest-neighbor algorithm is that it considers the entire set of n points for every execution. However, consider the Ann and Aknn problems where the same dataset is used n times.

How does Approximate nearest neighbor work?

In approximately nearest neighbors (ANN), we build index structures that narrow down the search space. The implicit neighborhoods in such indexes also help reduce the problem of high dimensions. You can roughly divide the approaches used for ANNs into whether or not they can be implemented using an inverse index.


1 Answers

There is a good reason why no NN package does that.

The reason is that the "NN data structure" necessarily includes all the input data points (in the form of a KD tree), so there is no space savings against the input data. It appears that there would be time savings in not having to re-create the KD-tree for each new input, but this is not the case, alas.

The reason is that the time to build a KD-tree is, in general, worse than linearithmic. This means that, for large inputs, it makes sense to sort the data before building the KD-tree because that will produce the KD-tree faster and it will be better balanced, which will improve the search too (it is also worse than logarithmic, in general). This approach would speed up modeling and evaluation but discourage incremental updates, of course.

Your best bet, I think, if to find a generic KD-tree package and use it instead.

like image 167
sds Avatar answered Sep 28 '22 07:09

sds