Geometric median of multidimensional points

Tags:

I have an array of 3D points:

a = np.array([[2., 3., 8.], [10., 4., 3.], [58., 3., 4.], [34., 2., 43.]])

How can I compute the geometric median of those points?

370

asked May 18 '15 09:05

orlp

3 Answers

I implemented Yehuda Vardi and Cun-Hui Zhang's algorithm for the geometric median, described in their paper "The multivariate L1-median and associated data depth". Everything is vectorized in numpy, so should be very fast. I didn't implement weights - only unweighted points.

import numpy as np
from scipy.spatial.distance import cdist, euclidean

def geometric_median(X, eps=1e-5):
    y = np.mean(X, 0)

    while True:
        D = cdist(X, [y])
        nonzeros = (D != 0)[:, 0]

        Dinv = 1 / D[nonzeros]
        Dinvs = np.sum(Dinv)
        W = Dinv / Dinvs
        T = np.sum(W * X[nonzeros], 0)

        num_zeros = len(X) - np.sum(nonzeros)
        if num_zeros == 0:
            y1 = T
        elif num_zeros == len(X):
            return y
        else:
            R = (T - y) * Dinvs
            r = np.linalg.norm(R)
            rinv = 0 if r == 0 else num_zeros/r
            y1 = max(0, 1-rinv)*T + min(1, rinv)*y

        if euclidean(y, y1) < eps:
            return y1

        y = y1

In addition to the default SO license terms, I release the code above under the zlib license, if you so prefer.

answered Oct 07 '22 17:10

orlp

The calculation of the geometric median with the Weiszfeld's iterative algorithm is implemented in Python in this gist or in the function below copied from the OpenAlea software (CeCILL-C license),

import numpy as np
import math
import warnings

def geometric_median(X, numIter = 200):
    """
    Compute the geometric median of a point sample.
    The geometric median coordinates will be expressed in the Spatial Image reference system (not in real world metrics).
    We use the Weiszfeld's algorithm (http://en.wikipedia.org/wiki/Geometric_median)

    :Parameters:
     - `X` (list|np.array) - voxels coordinate (3xN matrix)
     - `numIter` (int) - limit the length of the search for global optimum

    :Return:
     - np.array((x,y,z)): geometric median of the coordinates;
    """
    # -- Initialising 'median' to the centroid
    y = np.mean(X,1)
    # -- If the init point is in the set of points, we shift it:
    while (y[0] in X[0]) and (y[1] in X[1]) and (y[2] in X[2]):
        y+=0.1

    convergence=False # boolean testing the convergence toward a global optimum
    dist=[] # list recording the distance evolution

    # -- Minimizing the sum of the squares of the distances between each points in 'X' and the median.
    i=0
    while ( (not convergence) and (i < numIter) ):
        num_x, num_y, num_z = 0.0, 0.0, 0.0
        denum = 0.0
        m = X.shape[1]
        d = 0
        for j in range(0,m):
            div = math.sqrt( (X[0,j]-y[0])**2 + (X[1,j]-y[1])**2 + (X[2,j]-y[2])**2 )
            num_x += X[0,j] / div
            num_y += X[1,j] / div
            num_z += X[2,j] / div
            denum += 1./div
            d += div**2 # distance (to the median) to miminize
        dist.append(d) # update of the distance evolution

        if denum == 0.:
            warnings.warn( "Couldn't compute a geometric median, please check your data!" )
            return [0,0,0]

        y = [num_x/denum, num_y/denum, num_z/denum] # update to the new value of the median
        if i > 3:
            convergence=(abs(dist[i]-dist[i-2])<0.1) # we test the convergence over three steps for stability
            #~ print abs(dist[i]-dist[i-2]), convergence
        i += 1
    if i == numIter:
        raise ValueError( "The Weiszfeld's algoritm did not converged after"+str(numIter)+"iterations !!!!!!!!!" )
    # -- When convergence or iterations limit is reached we assume that we found the median.

    return np.array(y)

Alternatively, you could use the C implementation, mentionned in this answer, and interface it to python with, for instance, ctypes.

answered Oct 07 '22 17:10

rth

The problem can be easily approximated with minimize module in scipy. In this module, it provides various optimization algorithms, from nelder-mead to newton-CG. Nelder-mead algorithm is particular useful if you do not want to bother with high order derivatives, at a cost of losing some precision. Nevertheless, you just need to know the function to be minimized for nelder-mead algorithm to work.

Now, referring to the same array in the questions, if we use @orlp's method, we will get this:

geometric_median(a)
# array([12.58942481,  3.51573852,  7.28710661])

For Nelder-mead method, you will see below. The function to be minimized is the distance function from all points i.e.

a formula

Here is the code:

from scipy.optimize import minimize
x = [point[0] for point in a]
y = [point[1] for point in a]
z = [point[2] for point in a]

x0 = np.array([sum(x)/len(x),sum(y)/len(y), sum(z)/len(z)])
def dist_func(x0):
    return sum(((np.full(len(x),x0[0])-x)**2+(np.full(len(x),x0[1])-y)**2+(np.full(len(x),x0[2])-z)**2)**(1/2))
res = minimize(dist_func, x0, method='nelder-mead', options={'xtol': 1e-8, 'disp': True})
res.x
# array([12.58942487,  3.51573846,  7.28710679])

Note that I use to mean of all points as the initial values for the alogrithm. The result is quite close to @orlp's method, which is more accurate. As I mentioned, you sacrifice a bit but still get quite good approximates.

Performance of Nelder Mead Algorithm For this, I generated a test_array with 10000 entries of points from normal distribution centred at 3.2. Therefore, the geometric median should be quite close to [3.2, 3.2, 3.2].

np.random.seed(3)
test_array = np.array([[np.random.normal(3.2,20),
                        np.random.normal(3.2,20),
                        np.random.normal(3.2,20)] for i in np.arange(10000)])

For @orlp's method,

%timeit geometric_median(test_array)
# 12.1 ms ± 270 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
# array([2.95151061, 3.14098477, 3.01468281])

For Nelder mead,

%timeit res.x
# 565 ms ± 14.8 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
# array([2.95150898, 3.14098468, 3.01468276])

@orlp's method is very fast while Nelder mead is not bad. However, Nelder mead method is generic whereas @orlp's is specific to geometric median. The method you would like to choose would depend on your purpose. If you just want an approximate, I will choose Nelder at ease. If you want to be exact, @orlp's method is both faster and more accurate.

answered Oct 07 '22 19:10

shiba

Related questions
                            
                                Is there a way to make python become interactive in the middle of a script?
                            
                                Why is i++++++++i valid in python?
                            
                                csrf error in django
                            
                                need the average from a list of timedelta objects
                            
                                How can I close an image shown to the user with the Python Imaging Library?
                            
                                Saving a Numpy array as an image (instructions)
                            
                                Python FTP get the most recent file by date
                            
                                How can I flip an image along the vertical axis with python? [closed]
                            
                                Python combine two for loops
                            
                                ordering shuffled points that can be joined to form a polygon (in python)
                            
                                python regex findall and multiline
                            
                                from . import * from module
                            
                                UnicodeDecodeError in Python 3 when importing a CSV file
                            
                                in pandas how can I groupby weekday() for a datetime column?
                            
                                How to route a chain of tasks to a specific queue in celery?
                            
                                convert selected datetime to date in sqlalchemy
                            
                                Reverse Indexing in Python?
                            
                                Group dictionary key values in python
                            
                                django admin inline many to many custom fields
                            
                                python find difference between two lists [duplicate]

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Geometric median of multidimensional points

Tags:

python

numpy

scipy

orlp

People also ask

3 Answers

orlp

rth

shiba

Recent Activity

Donate For Us