Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Max Distance between 2 points in a data set and identifying the points

I have a matrix consisting of x,y,z coordinates of several points. I would like to find the extremum points i.e. the two points that are farthest apart.

I could figure out a way in matlab, but i need it in Python

Here is the code in matlab

A = randint(500,3,[-5 5]);
D=pdist(A);
D=squareform(D);
[N,I]=max(D(:));
[I_row, I_col] = ind2sub(size(D),I);

pdist gives the distance between pairs of points(i,j). squareform gives the matrix output In last two steps I attempt to find the indices of the matrix I_row, I_col..

Points I_row and I_col have the max distance..

Could anybody suggest me an efficient way in python as all my other codes are in Python.

like image 967
Mechanician Avatar asked Jul 28 '15 03:07

Mechanician


3 Answers

All the other answers here take O(N^2) time and space. That's terrible.

Instead, recognize that the two farthest points in a dataset lie on the set's convex hull. Since hulls can be computed in O(N log N) time in 3D this forms an efficient prefilter. In my testing on a dataset with 16,000,000 points the convex hull contained only 420 points.

The farthest points on the hull can be found in O(H log H) time, though this is harder to implement in Python. So we can instead at that point fall back to the O(N^2) cdist solutions.

import numpy as np
from scipy.spatial import ConvexHull
from scipy.spatial.distance import cdist

N = 16000000

# Find a convex hull in O(N log N)
points = np.random.rand(N, 3)   # N random points in 3-D

# Returned 420 points in testing
hull = ConvexHull(points)

# Extract the points forming the hull
hullpoints = points[hull.vertices,:]

# Naive way of finding the best pair in O(H^2) time if H is number of points on
# hull
hdist = cdist(hullpoints, hullpoints, metric='euclidean')

# Get the farthest apart points
bestpair = np.unravel_index(hdist.argmax(), hdist.shape)

#Print them
print([hullpoints[bestpair[0]],hullpoints[bestpair[1]]])
like image 188
Richard Avatar answered Oct 10 '22 19:10

Richard


If you have scipy, you have exact equivalent for most of matlab core functions :

from numpy import random, nanmax, argmax, unravel_index
from scipy.spatial.distance import pdist, squareform

A = random.randint(-5,5, (500,3))
D = pdist(A)
D = squareform(D);
N, [I_row, I_col] = nanmax(D), unravel_index( argmax(D), D.shape )

You can also get it in pure python using itertools :

from itertools import combinations
from random import randint

A = [[randint(-5,5) for coord in range(3)] for point in range(500)]

def square_distance(x,y): return sum([(xi-yi)**2 for xi, yi in zip(x,y)])    

max_square_distance = 0
for pair in combinations(A,2):
    if square_distance(*pair) > max_square_distance:
        max_square_distance = square_distance(*pair)
        max_pair = pair
like image 3
Yann Avatar answered Oct 10 '22 17:10

Yann


This will give you pairs of indexes of the points in A that are furthest apart as temp_b. Note that it will include both directions such as (8, 222) and (222, 8) in the list. I will leave it to you remove them if you want.

import numpy as np
import random as rd
from scipy.spatial.distance import pdist
from scipy.spatial.distance import squareform

A = np.array([np.array([rd.randint(-5,5) for x in range(3)]) for y in range(500)])
D=pdist(A)
D=squareform(D)
temp = np.where(D == D.max())
temp_b = zip(temp[0],temp[1])
like image 2
dmargol1 Avatar answered Oct 10 '22 19:10

dmargol1