Can I use arbitrary metrics to search KD-Trees?

Tags:

I just finished implementing a kd-tree for doing fast nearest neighbor searches. I'm interested in playing around with different distance metrics other than the Euclidean distance. My understanding of the kd-tree is that the speedy kd-tree search is not guaranteed to give exact searches if the metric is non-Euclidean, which means that I might need to implement a new data structure and search algorithm if I want to try out new metrics for my search.

I have two questions:

Does using a kd-tree permanently tie me to the Euclidean distance?
If so, what other sorts of algorithms should I try that work for arbitrary metrics? I don't have a ton of time to implement lots of different data structures, but other structures I'm thinking about include cover trees and vp-trees.

662

asked Apr 01 '09 05:04

James Thompson

1 Answers

The nearest-neighbour search procedure described on the Wikipedia page you linked to can certainly be generalised to other distance metrics, provided you replace "hypersphere" with the equivalent geometrical object for the given metric, and test each hyperplane for crossings with this object.

Example: if you are using the Manhattan distance instead (i.e. the sum of the absolute values of all differences in vector components), your hypersphere would become a (multidimensional) diamond. (This is easiest to visualise in 2D -- if your current nearest neighbour is at distance x from the query point p, then any closer neighbour behind a different hyperplane must intersect a diamond shape that has width and height 2x and is centred on p). This might make the hyperplane-crossing test more difficult to code or slower to run, however the general principle still applies.

answered Oct 11 '22 04:10

j_random_hacker

Related questions
                            
                                Algorithm to compare similarity of English sentences
                            
                                Python Implementations of Packing Algorithm
                            
                                Clarification of statement of performance of collection's binary search from javadoc
                            
                                Shortest uncommon substring: shortest substring of one string, that is not a substring of another string
                            
                                In-place interleaving of the two halves of a string
                            
                                How to find the subarray that has sum closest to zero or a certain value t in O(nlogn)
                            
                                Finding middle element of linked list with 1 pass, is this a creative "useless answer"?
                            
                                Efficient manipulation of a list of cartesian coordinates in Python
                            
                                Count number of subsets with sum equal to k
                            
                                Knapsack with multiple bags and items having only weight
                            
                                Precise subpixel line drawing algorithm (rasterization algorithm)
                            
                                Checking if given preorder traversal is valid BST
                            
                                C++ Difference between std::lower_bound and std::set::lower_bound?
                            
                                Find permutations by repeatedly cycling 3 elements
                            
                                Dijkstra with Parallel edges and self-loop
                            
                                how to make StdIn.isEmpty() return true?
                            
                                Better "centerpoint" than centroid
                            
                                Generating all possible "unique" RPN (Reverse Polish notation) expressions
                            
                                Compressing a vector of positive integers (int32) that have a specific order
                            
                                What is the most elegant way of bubble-sorting in F#?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Can I use arbitrary metrics to search KD-Trees?

Tags:

algorithm

search

math

data-structures

machine-learning

James Thompson

People also ask

1 Answers

j_random_hacker

Recent Activity

Donate For Us