Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Best data structure for nearest neighbour in 1 dimension

I have a list of values (1-dimensional) and I would like to know the best data structure / algorithm for finding the nearest to a query value I have. Most of the solutions (all?) I found for questions here are for 2 or more dimensions. Can anybody suggest to me the approach for my case?

My instinct tells me to sort the data and use binary search somehow. By the way, there is no limit on the construction or insertion time for any tree needed, so probably someone can suggest a better tree than simply a sorted list.

like image 584
Muhammad Alkarouri Avatar asked Jul 18 '10 18:07

Muhammad Alkarouri


People also ask

Is KNN good for high dimensional data?

Because, in high-dimensional spaces, the k-NN algorithm faces two difficulties: It becomes computationally more expensive to compute distance and find the nearest neighbors in high-dimensional space.

What type of algorithm is nearest neighbor?

The k-nearest neighbors (KNN) algorithm is a simple, supervised machine learning algorithm that can be used to solve both classification and regression problems.

What sort of data does k-nearest neighbors classify best on?

It can be used for both classification and regression problems. It's ideal for non-linear data since there's no assumption about underlying data. It can naturally handle multi-class cases.

What is the VC dimension of K nearest neighbor?

VC dimension of kNN with k=1 is infinite.


2 Answers

If you need something faster than O(log(n)), which you can easily get with a sorted array or a binary search tree, you can use a van Emde Boas Tree. vEB trees give you O(log(log(n))) to search for the closest element on either side.

like image 53
mathmike Avatar answered Oct 28 '22 16:10

mathmike


If insertion time is irrelevant, then binary search on a sorted array is the simplest way to achieve O(log N) query time. Each time an item is added sort everything. For each query, perform a binary search. If a match is found, return it. Otherwise, the binary search should return the index of the item, where it should have been inserted. Use this index to check the two neighboring items and determine which of them is closer to the query point.

I suppose that there are solutions with O(1) time. I will try to think of one that doesn't involve too much memory usage...

like image 32
Eyal Schneider Avatar answered Oct 28 '22 14:10

Eyal Schneider