How to find out Geometric Median

Q: How do you find the geometric mean of a frequency distribution?

Geometric Mean of Frequency Distribution = 1&frasl;N (f1 log x1 + f2 log x2 + … + fn log xn) = 1&frasl;N [∑ i=1n fi log xi ].

2 Answers

I solved something similar for a local online judge once using simulated annealing. That was the official solution as well and the program got AC.

The only difference was that the point I had to find did not have to be part of the N given points.

This was my C++ code, and N could be as large as 50000. The program executes in 0.1s on a 2ghz pentium 4.

// header files for IO functions and math #include <cstdio> #include <cmath>  // the maximul value n can take const int maxn = 50001;  // given a point (x, y) on a grid, we can find its left/right/up/down neighbors // by using these constants: (x + dx[0], y + dy[0]) = upper neighbor etc. const int dx[] = {-1, 0, 1, 0}; const int dy[] = {0, 1, 0, -1};  // controls the precision - this should give you an answer accurate to 3 decimals const double eps = 0.001;  // input and output files FILE *in = fopen("adapost2.in","r"), *out = fopen("adapost2.out","w");  // stores a point in 2d space struct punct {     double x, y; };  // how many points are in the input file int n;  // stores the points in the input file punct a[maxn];  // stores the answer to the question double x, y;  // finds the sum of (euclidean) distances from each input point to (x, y) double dist(double x, double y) {     double ret = 0;      for ( int i = 1; i <= n; ++i )     {         double dx = a[i].x - x;         double dy = a[i].y - y;          ret += sqrt(dx*dx + dy*dy); // classical distance formula     }      return ret; }  // reads the input void read() {     fscanf(in, "%d", &n); // read n from the first       // read n points next, one on each line     for ( int i = 1; i <= n; ++i )         fscanf(in, "%lf %lf", &a[i].x, &a[i].y), // reads a point         x += a[i].x,         y += a[i].y; // we add the x and y at first, because we will start by approximating the answer as the center of gravity      // divide by the number of points (n) to get the center of gravity     x /= n;      y /= n; }  // implements the solving algorithm void go() {     // start by finding the sum of distances to the center of gravity     double d = dist(x, y);      // our step value, chosen by experimentation     double step = 100.0;      // done is used to keep track of updates: if none of the neighbors of the current     // point that are *step* steps away improve the solution, then *step* is too big     // and we need to look closer to the current point, so we must half *step*.     int done = 0;      // while we still need a more precise answer     while ( step > eps )     {         done = 0;         for ( int i = 0; i < 4; ++i )         {             // check the neighbors in all 4 directions.             double nx = (double)x + step*dx[i];             double ny = (double)y + step*dy[i];              // find the sum of distances to each neighbor             double t = dist(nx, ny);              // if a neighbor offers a better sum of distances             if ( t < d )             {                 update the current minimum                 d = t;                 x = nx;                 y = ny;                  // an improvement has been made, so                 // don't half step in the next iteration, because we might need                 // to jump the same amount again                 done = 1;                 break;             }         }          // half the step size, because no update has been made, so we might have         // jumped too much, and now we need to head back some.         if ( !done )             step /= 2;     } }  int main() {     read();     go();      // print the answer with 4 decimal points     fprintf(out, "%.4lf %.4lf\n", x, y);      return 0; }

Then I think It's correct to pick the one from your list that is closest to the (x, y) returned by this algorithm.

This algorithm takes advantage of what this wikipedia paragraph on the geometric median says:

However, it is straightforward to calculate an approximation to the geometric median using an iterative procedure in which each step produces a more accurate approximation. Procedures of this type can be derived from the fact that the sum of distances to the sample points is a convex function, since the distance to each sample point is convex and the sum of convex functions remains convex. Therefore, procedures that decrease the sum of distances at each step cannot get trapped in a local optimum.

One common approach of this type, called Weiszfeld's algorithm after the work of Endre Weiszfeld,[4] is a form of iteratively re-weighted least squares. This algorithm defines a set of weights that are inversely proportional to the distances from the current estimate to the samples, and creates a new estimate that is the weighted average of the samples according to these weights. That is,

The first paragraph above explains why this works: because the function we are trying to optimize does not have any local minimums, so you can greedily find the minimum by iteratively improving it.

Think of this as a sort of binary search. First, you approximate the result. A good approximation will be the center of gravity, which my code computes when reading the input. Then, you see if adjacent points to this give you a better solution. In this case, a point is considered adjacent if it as a distance of step away from your current point. If it is better, then it is fine to discard your current point, because, as I said, this will not trap you into a local minimum because of the nature of the function you are trying to minimize.

After this, you half the step size, just like in binary search, and continue until you have what you consider to be a good enough approximation (controlled by the eps constant).

The complexity of the algorithm therefore depends on how accurate you want the result to be.

184

answered Oct 11 '22 10:10

IVlad

It appears that the problem is difficult to solve in better than O(n^2) time when using Euclidean distances. However the point that minimizes the sum of Manhattan distances to other points or the point that minimizes the sum of squares of Euclidean distances to other points can be found in O(n log n) time. (Assuming multiplying two numbers is O(1)). Let me shamelessly copy/paste my solution for Manhattan distances from a recent post:

Create a sorted array of x-coordinates and for each element in the array compute the "horizontal" cost of choosing that coordinate. The horizontal cost of an element is the sum of distances to all the points projected onto the X-axis. This can be computed in linear time by scanning the array twice (once from left to right and once in the reverse direction). Similarly create a sorted array of y-coordinates and for each element in the array compute the "vertical" cost of choosing that coordinate.

Now for each point in the original array, we can compute the total cost to all other points in O(1) time by adding the horizontal and vertical costs. So we can compute the optimal point in O(n). Thus the total running time is O(n log n).

We can follow a similar approach for computing the point that minimizes the sum of squares of Euclidean distances to other points. Let the sorted x-coordinates be: x₁, x₂, x₃, ..., x_n. We scan this list from left to right and for each point x_i we compute:

l_i = sum of distances to all the elements to the left of x_i = (x_i-x₁) + (x_i-x₂) + .... + (x_i-x_i-1) , and

sl_i = sum of squares of distances to all the elements to the left of x_i = (x_i-x₁)^2 + (x_i-x₂)^2 + .... + (x_i-x_i-1)^2

Note that given l_i and sl_i we can compute l_i+1 and sl_i+1 in O(1) time as follows:

Let d = x_i+1-x_i. Then:

l_i+1 = l_i + id and sl_i+1 = sl_i + id^2 + 2*i*d

Thus we can compute all the l_i and sl_i in linear time by scanning from left to right. Similarly, for every element we can compute the r_i: sum of distances to all elements to the right and the sr_i: sum of squares of distances to all the elements to the right in linear time. Adding sr_i and sl_i for each i, gives the sum of squares of horizontal distances to all the elements, in linear time. Similarly, compute the sum of squares of vertical distances to all the elements.

Then we can scan through the original points array and find the point that minimizes the sum of squares of vertical and horizontal distances as before.

answered Oct 11 '22 10:10

krjampani

Related questions
                            
                                Find the element repeated more than n/2 times
                            
                                How do I find the Excel column name that corresponds to a given integer? [duplicate]
                            
                                How do I find a factorial? [closed]
                            
                                Infinite recursion in JavaScript quicksort?
                            
                                What is the difference between Dijkstra and Prim's algorithm? [duplicate]
                            
                                What's a good one-pass pseudo-random shuffle? [closed]
                            
                                0/1 knapsack with dependent item weight?
                            
                                Finding Contiguous Areas of Bits in 2D Bit Array
                            
                                Find all chordless cycles in an undirected graph
                            
                                Tarjan cycle detection help C#
                            
                                Fully cover a rectangle with minimum amount of fixed radius circles
                            
                                Maximum possible number of rectangles that can be crossed with a single straight line
                            
                                Improving search result using Levenshtein distance in Java
                            
                                Algorithm to find lowest common ancestor in directed acyclic graph?
                            
                                Constructive solid geometry mesh
                            
                                Obtain forest out of tree with even number of nodes
                            
                                Does an algorithm exist to help detect the "primary topic" of an English sentence?
                            
                                How can I fit a Bézier curve to a set of data?
                            
                                What is the "cut-and-paste" proof technique?
                            
                                Detect differences between two strings

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

How to find out Geometric Median

Tags:

algorithm

computational-geometry

SexyBeast

People also ask

2 Answers

IVlad

krjampani

Recent Activity

Donate For Us