Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Tricky Median Question

Given n points, choose a point in the given list such that the sum of distances to this point is minimum ,compared to all others.

Distance is measured in the following manner.

For a point (x,y) all 8 adjacent points have distance 1.

(x+1,y)(x+1,y+1),(x+1,y-1),(x,y+1),(x,y-1),(x-1,y)(x-1,y+1),(x-1,y-1)

EDIT

More clearer explanation.

A function foo is defined as

foo(point_a,point_b) = max(abs(point_a.x - point_b.x),abs(point_a.y - point_b.y))

Find a point x such that sum([foo(x,y) for y in list_of_points]) is minimum.

Example

Input:

12 -14
-3 3
-14 7
-14 -3
2 -12
-1 -6

Output

-1 -6

Eg: Distance between (4,5) and 6,7) is 2.

This can be done in O(n^2) time, by checking the sum of each pair. Is there any better algorithm to do it?

like image 575
Greyhound Avatar asked Aug 14 '11 20:08

Greyhound


People also ask

How do you solve a median question?

For a set of two numbers, the value of the median will be the same as the value of the mean. For example, 2 and 10 are the two numbers. Also, mean = (2+10)/2 = 12/2 = 6. Hence, the median of two numbers is equal to the mean.

What is median Class 7 maths?

Median, in statistics, is the middle value of the given list of data, when arranged in an order. The arrangement of data or observations can be done either in ascending order or descending order.

How do you find the mean median and mode Questions?

For example, take this list of numbers: 10, 10, 20, 40, 70. The mean (informally, the “average“) is found by adding all of the numbers together and dividing by the number of items in the set: 10 + 10 + 20 + 40 + 70 / 5 = 30. The median is found by ordering the set from lowest to highest and finding the exact middle.


2 Answers

Update: it sometimes fails to find the optimum, I'll leave this here till I find the problem.

this is O(n): nth is O(n) (expected, not worst), iterating over the list is O(n). If you need strict O() then pick the middle element with sorting but then it's going to be O(n*log(n)).

Note: it's easy to modifiy it to return all the optimal points.

import sys

def nth(sample, n):
    pivot = sample[0]
    below = [s for s in sample if s < pivot]
    above = [s for s in sample if s > pivot]
    i, j = len(below), len(sample)-len(above)
    if n < i:      return nth(below, n)
    elif n >= j:   return nth(above, n-j)
    else:          return pivot

def getbest(li):
    ''' li is a list of tuples (x,y) '''
    l = len(li)
    lix = [x[0] for x in li]
    liy = [x[1] for x in li]

    mid_x1 = nth(lix, l/2) if l%2==1 else nth(lix, l/2-1)
    mid_x2 = nth(lix, l/2)
    mid_y1 = nth(liy, l/2) if l%2==1 else nth(liy, l/2-1)
    mid_y2 = nth(liy, l/2)

    mindist = sys.maxint
    minp = None
    for p in li:
        dist = 0 if mid_x1 <= p[0] <= mid_x2 else min(abs(p[0]-mid_x1), abs(p[0]-mid_x2))
        dist += 0 if mid_y1 <= p[1] <= mid_y2 else min(abs(p[1]-mid_y1), abs(p[1]-mid_y2))
        if dist < mindist:
            minp, mindist = p, dist
    return minp

It's based on the solution of the one dimensional problem - for a list of numbers find a number for which the sum distance is the minimum.

The solution for this is the middle element of the (sorted) list or any number between the two middle elements (including these two elements) if there are an even number of elements in the list.

Update: my nth algorithm seems to be very slow, probably there is a better way to rewrite it, sort outperforms it with < 100000 elements, so if you do speed comparison, just add sort(lix); sort(liy); and

def nth(sample, n):
    return sample[n]

For anyone out there who wants to test his solution, here is what I use. Just run a loop, generate input and compare your solution with the output of bruteforce.

import random
def example(length):
    l = []
    for x in range(length):
        l.append((random.randint(-100, 100), random.randint(-100,100)))
    return l

def bruteforce(li):
    bestsum = sys.maxint
    bestp = None
    for p in li:
        sum = 0
        for p1 in li:
            sum += max(abs(p[0]-p1[0]), abs(p[1]-p1[1]))
        if sum < bestsum:
            bestp, bestsum = p, sum
    return bestp
like image 107
Karoly Horvath Avatar answered Sep 26 '22 22:09

Karoly Horvath


I can imagine a scheme better than O(n^2), at least in the common case.

Build a quadtree out of your input points. For each node in the tree, compute the number and average position of the points within that node. Then for each point, you can use the quadtree to compute its distance to all other points in less than O(n) time. If you're computing the distance from a point p to a distant quadtree node v, and v doesn't overlap the 45 degree diagonals from p, then the total distance from p to all the points in v is easy to compute (for v which are more horizontally than vertically separated from p, it is just v.num_points * |p.x - v.average.x|, and similarly using y coordinates if v is predominately vertically seperated). If v overlaps one of the 45 degree diagonals, recurse on its components.

That should beat O(n^2), at least when you can find a balanced quadtree to represent your points.

like image 26
Keith Randall Avatar answered Sep 23 '22 22:09

Keith Randall