Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Distance measure between two sets of possibly different size

I have 2 sets of integers, A and B, not necessarily of the same size. For my needs, I take the distance between each 2 elements a and b (integers) to be just abs(a-b).

I am defining the distance between the two sets as follows:

  1. If the sets are of the same size, minimize the sum of distances of all pairs [a,b] (a from A and b from B), minimization over all possible 'pairs partitions' (there are n! possible partitions).
  2. If the sets are not of the same size, let's say A of size m and B of size n, with m < n, then minimize the distance from (1) over all subsets of B which are of size m.

My question is, is the following algorithm (just an intuitive guess) gives the right answer, according to the definition written above.

  • Construct a matrix D of size m X n, with D(i,j) = abs(A(i)-B(j))
  • Find the smallest element of D, accumulate it, and delete the row and the column of that element. Accumulate the next smallest entry, and keep accumulating until all rows and columns are deleted.

for example, if A={0,1,4} and B={3,4}, then D is (with the elements above and to the left):

3 4

0 3 4

1 2 3

4 1 0

And the distance is 0 + 2 = 2, coming from pairing 4 with 4 and 3 with 1.

like image 836
Itamar Katz Avatar asked Dec 14 '10 16:12

Itamar Katz


2 Answers

Note that this problem is referred to sometimes as the skis and skiers problem, where you have n skis and m skiers of varying lengths and heights. The goal is to match skis with skiers so that the sum of the differences between heights and ski lengths is minimized.

To solve the problem you could use minimum weight bipartite matching, which requires O(n^3) time.

Even better, you can achieve O(n^2) time with O(n) extra memory using the simple dynamic programming algorithm below.

Optimally, you can solve the problem in linear time if the points are already sorted using the algorithm described in this paper.

O(n^2) dynamic programming algorithm:

if (size(B) > size(A))
  swap(A, B);
sort(A);
sort(B);
opt = array(size(B));
nopt = array(size(B));
for (i = 0; i < size(B); i++)
  opt[i] = abs(A[0] - B[i]);
for (i = 1; i < size(A); i++) {
  fill(nopt, infinity);
  for (j = 1; j < size(B); j++) {
    nopt[j] = min(nopt[j - 1], opt[j - 1] + abs(A[i] - B[j]));
  swap(opt, nopt);
}
return opt[size(B) - 1];

After each iteration i of the outer for loop above, opt[j] contains the optimal solution matching {A[0],..., A[i]} using the elements {B[0],..., B[j]}.

The correctness of this algorithm relies on the fact that in any optimal matching if a1 is matched with b1, a2 is matched with b2, and a1 < a2, then b1 <= b2.

like image 62
jonderry Avatar answered Sep 18 '22 23:09

jonderry


In order to get the optimum, solve the assignment problem on D.

The assignment problem finds a perfect matching in a bipartite graph such that the total edge weight is minimized, which maps perfectly to your problem. It is also in P.

EDIT to explain how OP's problem maps onto assignment.

For simplicity of explanation, extend the smaller set with special elements e_k.

Let A be the set of workers, and B be the set of tasks (the contents are just labels).

Let the cost be the distance between an element in A and B (i.e. an entry of D). The distance between e_k and anything is 0.

Then, we want to find a perfect matching of A and B (i.e. every worker is matched with a task), such that the cost is minimized. This is the assignment problem.

like image 36
lijie Avatar answered Sep 17 '22 23:09

lijie