Incremental Nearest Neighbor Algorithm in Python [closed]

Tags:

Is anyone aware of a nearest neighbor algorithm implemented in Python that can be updated incrementally? All the ones I've found, such as this one, appear to be batch processes. Is it possible to implement an incremental NN algorithm?

251

asked Nov 25 '10 06:11

Cerin

2 Answers

This is way late, but for posterity:

There is actually a technique for converting batch-processed algorithms like KD-Tree into incremental algorithms: it's called a static-to-dynamic transformation.

To generate an incremental variant of a KD-Tree, you store a set of trees instead of just one tree. When there are N elements in your nearest-neighbor structure, your structure will have a tree for each "1" bit in the binary representation of N. Moreover, if tree T_i corresponds to the i-th bit of N, then tree T_i contains 2^i elements.

So, if you have 11 elements in your structure, then N = 11, or 1011 in binary, and therefore you have three trees - T_3, T_1, and T_0 - with 8 elements, 2 elements, and 1 element, respectively.

Now, let's insert an element e into our structure. After insertion, we'll have 12 elements, or 1100 in binary. Comparing the new and the previous binary string, we see that T_3 doesn't change, we have a new tree T_2 with 4 elements, and trees T_1 and T_0 get deleted. We construct the new tree T_2 by doing a batch insertion of e along with all the elements in the trees "below" T_2, which are T_1 and T_0.

In this way, we create an incremental point query structure from a static base structure. There is, however, an asymptotic slowdown in "incrementalizing" static structures like this in the form of an extra log(N) factor:

inserting N elements in structure: O(N log(N) log(n))
nearest neighbor query for structure with N elements: O(log(n) log(n))

178

answered Oct 13 '22 00:10

giogadi

I think the problem with incremental construction of a KD-tree or KNN-tree is, as you've alluded to in a comment, that the tree will eventually become unbalanced and you can't do simple tree rotation to fix balance problems and keep consistency. At the minimum, the re-balancing task is not trivial and one would definitely not want to do it at each insertion. Often, one will choose to build a tree with a batch method, insert a bunch of new points and allow the tree to become unbalanced up to a point, and then re-balance it.

A very similar thing to do is to build the data structure in batch for M points, use it for M' points, and then re-build the data structure in batch with M+M' points. Since re-balancing is not normal, fast algorithm we are familiar with for trees, rebuilding is not necessarily slow in comparison and in some cases can be faster (depending on how the sequence of the points entering your incremental algorithm).

That being said, the amount of code you write, debugging difficulty, and the ease of others' understanding of your code can be significantly smaller if you take the rebuild approach. If you do so, you can use a batch method and keep an external list of points not yet inserted into the tree. A brute force approach can be used to ensure none of these is closer than the ones in the tree.

Some links to Python implementations/discussions are below, but I haven't found any that explicitly claim to be incremental. Good luck.

http://www.scipy.org/Cookbook/KDTree

http://cgi.di.uoa.gr/~compgeom/pycgalvisual/kdppython.shtml

http://sites.google.com/site/mikescoderama/Home/kd-tree-knn

http://en.wikipedia.org/wiki/Kd-tree

Note: My comments here apply to high-dimensional spaces. If you're working in 2D or 3D, what I've said may not be appropriate. (If you working in very high dimensional spaces, use brute force or approximate nearest neighbor.)

answered Oct 12 '22 22:10

RandomGuy

Related questions
                            
                                Understanding Spacy's Scorer Output
                            
                                Pandas dataframe from dictionary of list values
                            
                                insert row with datetime index to dataframe
                            
                                Repeated import path patterns in python
                            
                                PyTorch Binary Classification - same network structure, 'simpler' data, but worse performance?
                            
                                Why is the memory in GPU still in use after clearing the object?
                            
                                Conda not activated in Power Shell
                            
                                Simultaneous assignment indexing different list elements in Python [duplicate]
                            
                                Has anyone tried NetBeans 6.5 Python IDE?
                            
                                producer/consumer problem with python multiprocessing
                            
                                Do I have any obligations if I upload an egg to the CheeseShop?
                            
                                Looking for a bare-bones open-source editor written in python [closed]
                            
                                Running python script as another user
                            
                                Matplotlib: Formatting dates on the x-axis in a 3D Bar graph
                            
                                Is there a good library for accessing HBase from Python? [closed]
                            
                                Is there anything for Python that is like readability.js?
                            
                                Create and call python function from string via C API
                            
                                understanding zip function
                            
                                Calling staticmethod inside class level containers initialization
                            
                                How do I select a window from a numpy array with periodic boundary conditions?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Incremental Nearest Neighbor Algorithm in Python [closed]

Tags:

python

machine-learning

nearest-neighbor

Cerin

People also ask

2 Answers

giogadi

RandomGuy

Recent Activity

Donate For Us