Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Can I use arbitrary metrics to search KD-Trees?

I just finished implementing a kd-tree for doing fast nearest neighbor searches. I'm interested in playing around with different distance metrics other than the Euclidean distance. My understanding of the kd-tree is that the speedy kd-tree search is not guaranteed to give exact searches if the metric is non-Euclidean, which means that I might need to implement a new data structure and search algorithm if I want to try out new metrics for my search.

I have two questions:

  1. Does using a kd-tree permanently tie me to the Euclidean distance?
  2. If so, what other sorts of algorithms should I try that work for arbitrary metrics? I don't have a ton of time to implement lots of different data structures, but other structures I'm thinking about include cover trees and vp-trees.
like image 662
James Thompson Avatar asked Apr 01 '09 05:04

James Thompson


People also ask

Why are kd trees not suitable for efficiently finding the nearest neighbor in high dimensional spaces?

The reason that k-d trees are unsuitable for finding nearest neighbours in high dimensions is related to the so-called curse of dimensionality.

Is KD tree nearest Neighbour also a neighborhood search algorithm?

The KD Tree Algorithm is one of the most commonly used Nearest Neighbor Algorithms. The data points are split at each node into two sets. Like the previous algorithm, the KD Tree is also a binary tree algorithm always ending in a maximum of two nodes. The split criteria chosen are often the median.

How do you Kd a tree?

General procedure to construct a k-d tree is to recursively divide the space in 2 parts along the axis that has widest spread. Every node in the tree indicates along which dimension the space was divided by the node.

Why is KD tree used for Knn?

Advantages of using KDTree At each level of the tree, KDTree divides the range of the domain in half. Hence they are useful for performing range searches. It is an improvement of KNN as discussed earlier. The complexity lies in between O(log N) to O(N) where N is the number of nodes in the tree.


1 Answers

The nearest-neighbour search procedure described on the Wikipedia page you linked to can certainly be generalised to other distance metrics, provided you replace "hypersphere" with the equivalent geometrical object for the given metric, and test each hyperplane for crossings with this object.

Example: if you are using the Manhattan distance instead (i.e. the sum of the absolute values of all differences in vector components), your hypersphere would become a (multidimensional) diamond. (This is easiest to visualise in 2D -- if your current nearest neighbour is at distance x from the query point p, then any closer neighbour behind a different hyperplane must intersect a diamond shape that has width and height 2x and is centred on p). This might make the hyperplane-crossing test more difficult to code or slower to run, however the general principle still applies.

like image 76
j_random_hacker Avatar answered Oct 11 '22 04:10

j_random_hacker