I need to find for each point of the data set all its nearest neighbors. The data set contains approx. 10 million 2D points. The data are close to the grid, but do not form a precise grid... This option excludes (in my opinion) the use of KD Trees, where the basic assumption is no points have same x coordinate and y coordinate. I need a fast algorithm O(n) or better (but not too difficult for implementation :-)) ) to solve this problem ... Due to the fact that boost is not standardized, I do not want to use it ... Thanks for your answers or code samples...

I would do the following: <ol> <li> Create a larger grid on top of the points. </li> <li> Go through the points linearly, and for each one of them, figure out which large "cell" it belongs to (and add the points to a list associated with that cell). (This can be done in constant time for each point, just do an integer division of the coordinates of the points.) </li> <li> Now go through the points linearly again. To find the 10 nearest neighbors you only need to look at the points in the adjacent, larger, cells. Since your points are fairly evenly scattered, you can do this in time proportional to the number of points in each (large) cell. </li> </ol> Here is an (ugly) pic describing the situation: <img src="https://i.stack.imgur.com/IZGdB.png" alt="enter image description here"> The cells must be large enough for (the center) and the adjacent cells to contain the closest 10 points, but small enough to speed up the computation. You could see it as a "hash-function" where you'll find the closest points in the same bucket. (Note that strictly speaking it's not O(n) but by tweaking the size of the larger cells, you should get close enough. :-)

All k nearest neighbors in 2D, C++

2 Answers

I would do the following:

Create a larger grid on top of the points.
Go through the points linearly, and for each one of them, figure out which large "cell" it belongs to (and add the points to a list associated with that cell).

(This can be done in constant time for each point, just do an integer division of the coordinates of the points.)
Now go through the points linearly again. To find the 10 nearest neighbors you only need to look at the points in the adjacent, larger, cells.

Since your points are fairly evenly scattered, you can do this in time proportional to the number of points in each (large) cell.

Here is an (ugly) pic describing the situation:

enter image description here

The cells must be large enough for (the center) and the adjacent cells to contain the closest 10 points, but small enough to speed up the computation. You could see it as a "hash-function" where you'll find the closest points in the same bucket.

(Note that strictly speaking it's not O(n) but by tweaking the size of the larger cells, you should get close enough. :-)

answered Oct 17 '22 03:10

aioobe

I have used a library called ANN (Approximate Nearest Neighbour) with great success. It does use a Kd-tree approach, although there was more than one algorithm to try. I used it for point location on a triangulated surface. You might have some luck with it. It is minimal and was easy to include in my library just by dropping in its source.

Good luck with this interesting task!

answered Oct 17 '22 03:10

Daniel Lidström

Related questions
                            
                                C++ Class design - easily init / build objects
                            
                                Design pattern to refactor switch statement
                            
                                How to convert a string to complex<float> in C++?
                            
                                c++ function template compiles error "‘containerType’ is not a template"
                            
                                Memory deallocation and exceptions
                            
                                Is there a way to prohibit the use of a class by value in c style variable arguments list?
                            
                                Boost::tribool: odd behaviour, or bug?
                            
                                What is the difference between infinite while loops and for loops?
                            
                                C++ static variable
                            
                                Please suggest a good encryption library for VC++ 2008 [closed]
                            
                                A cast that is breaking strict-aliasing rules
                            
                                C++: mixture between vector and list: something like std::rope?
                            
                                How can I generate UML class diagrams from C++ source files? [closed]
                            
                                How do pointers to pointers and the address-of operator work?
                            
                                Casting to base class validity
                            
                                (c/c++) trying to force EOF from parent process sending input to child process
                            
                                How do I specialize a static member of a template class on a templated type?
                            
                                STL iterator: "dereferencing" iterator to a temporary. Is it possible?
                            
                                C++ MFC How to Draw Alpha transparent Rectangle
                            
                                3D Graphics Batching

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

All k nearest neighbors in 2D, C++

Tags:

c++

algorithm

large-data

nearest-neighbor

Ian

People also ask

2 Answers

aioobe

Daniel Lidström

Recent Activity

Donate For Us