I need to find the two points which are most far away from each other. I have, as the screenshots say, an array containing two other arrays. one for the X and one for the Y coordinates. What's the best way to determine the longest line through the data? by saying this, i need to select the two most far away points in the plot. Hope you guys can help. Below are some screenshots to help explain the problem.
You can avoid computing all pairwise distances by observing that the two points which are furthest apart will occur as vertices in the convex hull. You can then compute pairwise distances between fewer points.
For example, with 100,000 points distributed uniformly in a unit square, there are only 22 points in the convex hull in my instance.
import numpy as np
from scipy import spatial
# test points
pts = np.random.rand(100_000, 2)
# two points which are fruthest apart will occur as vertices of the convex hull
candidates = pts[spatial.ConvexHull(pts).vertices]
# get distances between each pair of candidate points
dist_mat = spatial.distance_matrix(candidates, candidates)
# get indices of candidates that are furthest apart
i, j = np.unravel_index(dist_mat.argmax(), dist_mat.shape)
print(candidates[i], candidates[j])
# e.g. [ 1.11251218e-03 5.49583204e-05] [ 0.99989971 0.99924638]
If your data is 2-dimensional, you can compute the convex hull in O(N*log(N))
time where N
is the number of points. By concentration of measure, this method deteriorates in performance for many common distributions as the number of dimensions grows.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With