Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Find Closest Vector from a List of Vectors | Python

If you are given say a list of 10 vectors, called A that represent different groups. Then you have a time series of vectors v1,v2,...,vn, each being a vector as well. I was wondering if there was a way to find the "closest" vector in A for each v1,v2,...,vn if you define some distance metric?

Is there a quick way to do this besides for looping through and just comparing all entries?

Edit: No I am not asking how to do k-means or something like that.

like image 637
ajl123 Avatar asked Sep 07 '15 22:09

ajl123


3 Answers

You can use the spatial KDtree in scipy. It uses a fast tree algorithm to identify close by points for vectors of arbitrary dimension.

Edit: sorry, if you are looking for arbitrary distance metrics, a Tree like structure might still be an option.

Here is an example:

>>> from scipy import spatial
>>> A = [[0,1,2,3,4], [4,3,2,1,0], [2,5,3,7,1], [1,0,1,0,1]]
>>> tree = spatial.KDTree(A)

This sets up the KDTree with all the points in A, allowing you to perform fast spatial searches within it. Such a query takes a vector and returns the closest neighbor in A for it:

>>> tree.query([0.5,0.5,0.5,0.5,0.5])
(1.1180339887498949, 3)

The first return value is the distance of the closest neighbor and the second its position in A, such that you can obtain it for example like this:

>>> A[ tree.query([0.5,0.5,0.5,0.5,0.5])[1] ]
[1, 0, 1, 0, 1]
like image 136
haraldkl Avatar answered Oct 03 '22 07:10

haraldkl


If you define a metric, you can use it in the min function:

closest = min(A, key=distance)
like image 39
jojonas Avatar answered Oct 03 '22 07:10

jojonas


So some sample code is:

# build a KD-tree to compare to some array of vectors 'centall'
tree = scipy.spatial.KDTree(centall) 
print 'shape of tree is ', tree.data.shape

# loop through different regions and identify any clusters that belong to a different region
[d1, i1] = tree.query(group1)
[d2, i2] = tree.query(group2)

This returns variables d and i. d stores the closest distance i returns the index at which this happens

Hope this helps.

like image 1
ajl123 Avatar answered Oct 03 '22 07:10

ajl123