Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

K-d trees: nearest neighbor search algorithm

This is my understanding of it: 1. Recurse down the tree, taking the left or right subtree according as whether ELEMENT would lie in the left or the right subtree, if it existed. 2. Set CURRENT_BEST as the first leaf node that you reach. 3. As you recurse back up, check to see whether ELEMENT lies closer to the splitting hyperplane than it does to CURRENT_BEST. If so, set CURRENT_BEST as the current node.

This is the part I got from Wikipedia and my class, and the part I don't understand: 4. Check to see whether any node in the other subtree of the splitting point singled out in 3. is closer to ELEMENT than the splitting point.

I don't see why we need to do 4., since any point that might lie in the one subtree of the splitting node must necessarily be closer to the splitting node than to any point in the other subtree.

It's obviously my understanding of the algorithm that is flawed, so help will be greatly appreciated.

like image 239
Kaiser Octavius Avatar asked Nov 22 '12 05:11

Kaiser Octavius


People also ask

What is K in the K nearest neighbors algorithm?

The k value in the k-NN algorithm defines how many neighbors will be checked to determine the classification of a specific query point. For example, if k=1, the instance will be assigned to the same class as its single nearest neighbor.

Does KNN use kd tree?

The use of k-d trees is a well known optimization to the kNN algorithm [34].

How does kd tree work?

A K-D Tree(also called as K-Dimensional Tree) is a binary search tree where data in each node is a K-Dimensional point in space. In short, it is a space partitioning(details below) data structure for organizing points in a K-Dimensional space.


2 Answers

Step 4 is the 'else' in step 3, what you do if the plane is closer than the point. Just because the point you found would be in the same rectangle as the point you are finding the neighbour for doesn't mean that it is the closest.

Imagine the following scenario: you have two points in your kD-Tree, A and B. A is in the middle of its rectangle, while B is just over the edge, in the partitioned area next to that of A. If you now search for the nearest neighbour to point C, which is right next to B but happens to be the other side of the edge and in the partition area of A, your first point you choose will be A due to the initial Depth First Search that chooses whatever would be in the same partition as your search point. However, B is actually closer, so even though you chose A, you need to check whether B is closer otherwise your kD-Tree won't actually give you correct results.

A good way of visualising this is to draw it out:

A-------------C--|--B

A is the first point we found in the DFS, C is our point we want the nearest neighbour of, B is the actual nearest neighbour, | is our split plane.

Another way to think of it is to draw a circle with radius dist(A,C) around point C. If any other rectangles have any portion of themselves fall within this circle, then there is a chance that they hold a point which might be closer to C than A is, so they must be checked. If you now find B, you can reduce the radius of your circle (because B is closer) so that less rectangles have a chance of intersecting, and once you have checked all the rectangles which intersect with your circle (reducing your circle radius as your find closer neighbours) you can definitively say that there are no closer points.

like image 121
jkflying Avatar answered Nov 15 '22 08:11

jkflying


I wrote a basic C++ implementation on github. It has both an iterative and recursive version.

like image 26
gvd Avatar answered Nov 15 '22 09:11

gvd