Saving and incrementally updating nearest-neighbor model in R

Tags:

r

There are several nearest neighbor R packages (e.g., FNN, RANN, yaImpute) but none of them seem to allow saving off the NN data structure (cover tree, KD tree etc.) so that the nearest neighbors of new queries can be calculated without reconstructing the whole tree. Are there any such functions in R?

I am looking for a function that returns a data structure that I can update incrementally as new data arrives to perform approximate K nearest neighbor search.

447

asked Aug 30 '12 17:08

Innuo

1 Answers

There is a good reason why no NN package does that.

The reason is that the "NN data structure" necessarily includes all the input data points (in the form of a KD tree), so there is no space savings against the input data. It appears that there would be time savings in not having to re-create the KD-tree for each new input, but this is not the case, alas.

The reason is that the time to build a KD-tree is, in general, worse than linearithmic. This means that, for large inputs, it makes sense to sort the data before building the KD-tree because that will produce the KD-tree faster and it will be better balanced, which will improve the search too (it is also worse than logarithmic, in general). This approach would speed up modeling and evaluation but discourage incremental updates, of course.

Your best bet, I think, if to find a generic KD-tree package and use it instead.

167

answered Sep 28 '22 07:09

sds

Related questions
                            
                                pivot_wider issue "Values in `values_from` are not uniquely identified; output will contain list-cols"
                            
                                R ggplot2: Labelling a horizontal line on the y axis with a numeric value
                            
                                Place y-axis on the right
                            
                                In `knitr` how can I test for if the output will be PDF or word?
                            
                                RMarkdown: How to end tabbed content
                            
                                Chain arithmetic operators in dplyr with %>% pipe
                            
                                Add a variable to a data frame containing max value of each row
                            
                                Specifying column names in a data.frame changes spaces to "."
                            
                                How to return 5 topmost values from vector in R?
                            
                                Structure of an R course for beginners
                            
                                Plotting pca biplot with ggplot2
                            
                                Eliminating NAs from a ggplot
                            
                                Check if R is running in RStudio
                            
                                Use hist() function in R to get percentages as opposed to raw frequencies
                            
                                Parent directory in R
                            
                                Update data frame via function doesn't work
                            
                                Subset dataframe by multiple logical conditions of rows to remove
                            
                                Alternative to expand.grid for data.frames
                            
                                Detect text language in R
                            
                                Performant 2D OpenGL graphics in R for fast display of raster image using qtpaint (qt) or rdyncall (SDL/OpenGL) packages?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Saving and incrementally updating nearest-neighbor model in R

Tags:

r

Innuo

People also ask

1 Answers

sds

Recent Activity

Donate For Us