I have a set of points W={(x1, y1), (x2, y2),..., (xn, yn)}
on the 2D plane. Can you find an algorithm that takes these points as the input and returns a point (x, y)
on the 2D plane which has the minimum sum of distances from the points in W
? In other words, if
di = Euclidean_distance((x, y), (xi, yi))
I want to minimize:
d1 + d2 + ... + dn
In geometry, the geometric median of a discrete set of sample points in a Euclidean space is the point minimizing the sum of distances to the sample points.
What is Euclidean Distance? In Mathematics, the Euclidean distance is defined as the distance between two points. In other words, the Euclidean distance between two points in the Euclidean space is defined as the length of the line segment between two points.
The Problem
You're looking for the geometric median.
An Easy Solution
There is no closed-form solution to this problem, so iterative or probabilistic methods are used. The easiest way to find this is probably with Weiszfeld's algorithm:
We can implement this in Python as follows:
import numpy as np
from numpy.linalg import norm as npnorm
c_pt_old = np.random.rand(2)
c_pt_new = np.array([0,0])
while npnorm(c_pt_old-c_pt_new)>1e-6:
num = 0
denom = 0
for i in range(POINT_NUM):
dist = npnorm(c_pt_new-pts[i,:])
num += pts[i,:]/dist
denom += 1/dist
c_pt_old = c_pt_new
c_pt_new = num/denom
print(c_pt_new)
There's a chance that Weiszfeld's algorithm won't converge, so it might be best to run it several times from different starting points.
A General Solution
You can also find this using second-order cone programming (SOCP). In addition to solving your specific problem, this general formulation then allows you to easily add constraints and weightings, such as variable uncertainty in the location of each data point.
To do so, you create a number of indicator variables representing the distance between the proposed center point and the data points.
You then minimize the sum of the indicator variables. The result follows
import cvxpy as cp
import numpy as np
import matplotlib.pyplot as plt
#Generate random test data
POINT_NUM = 100
pts = np.random.rand(POINT_NUM,2)
c_pt = cp.Variable(2) #The center point we wish to locate
distances = cp.Variable(POINT_NUM) #Distance from the center point to each data point
#Generate constraints. These are used to hold distances.
constraints = []
for i in range(POINT_NUM):
constraints.append( cp.norm(c_pt-pts[i,:])<=distances[i] )
objective = cp.Minimize(cp.sum(distances))
problem = cp.Problem(objective,constraints)
optimal_value = problem.solve()
print("Optimal value = {0}".format(optimal_value))
print("Optimal location = {0}".format(c_pt.value))
plt.scatter(x=pts[:,0], y=pts[:,1], s=1)
plt.scatter(c_pt.value[0], c_pt.value[1], s=10)
plt.show()
SOCPs are available in a number of solvers including CPLEX, Elemental, ECOS, ECOS_BB, GUROBI, MOSEK, CVXOPT, and SCS.
I've tested and the two approaches give the same answers to within tolerance.
Weiszfeld, E. (1937). "Sur le point pour lequel la somme des distances de n points donnes est minimum". Tohoku Mathematical Journal. 43: 355–386.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With