I have 2 sets of integers, A and B, not necessarily of the same size. For my needs, I take the distance between each 2 elements a and b (integers) to be just abs(a-b)
.
I am defining the distance between the two sets as follows:
My question is, is the following algorithm (just an intuitive guess) gives the right answer, according to the definition written above.
D
of size m X n
, with D(i,j) = abs(A(i)-B(j))
D
, accumulate it, and delete the row and the column of that element. Accumulate the next smallest entry, and keep accumulating until all rows and columns are deleted.for example, if A={0,1,4}
and B={3,4}
, then D
is (with the elements above and to the left):
3 4
0
3 4
1
2 3
4
1 0
And the distance is 0 + 2 = 2
, coming from pairing 4
with 4
and 3
with 1
.
Note that this problem is referred to sometimes as the skis and skiers problem, where you have n skis and m skiers of varying lengths and heights. The goal is to match skis with skiers so that the sum of the differences between heights and ski lengths is minimized.
To solve the problem you could use minimum weight bipartite matching, which requires O(n^3) time.
Even better, you can achieve O(n^2) time with O(n) extra memory using the simple dynamic programming algorithm below.
Optimally, you can solve the problem in linear time if the points are already sorted using the algorithm described in this paper.
O(n^2) dynamic programming algorithm:
if (size(B) > size(A))
swap(A, B);
sort(A);
sort(B);
opt = array(size(B));
nopt = array(size(B));
for (i = 0; i < size(B); i++)
opt[i] = abs(A[0] - B[i]);
for (i = 1; i < size(A); i++) {
fill(nopt, infinity);
for (j = 1; j < size(B); j++) {
nopt[j] = min(nopt[j - 1], opt[j - 1] + abs(A[i] - B[j]));
swap(opt, nopt);
}
return opt[size(B) - 1];
After each iteration i
of the outer for
loop above, opt[j]
contains the optimal solution matching {A[0],..., A[i]}
using the elements {B[0],..., B[j]}
.
The correctness of this algorithm relies on the fact that in any optimal matching if a1 is matched with b1, a2 is matched with b2, and a1 < a2, then b1 <= b2.
In order to get the optimum, solve the assignment problem on D
.
The assignment problem finds a perfect matching in a bipartite graph such that the total edge weight is minimized, which maps perfectly to your problem. It is also in P.
EDIT to explain how OP's problem maps onto assignment.
For simplicity of explanation, extend the smaller set with special elements e_k
.
Let A be the set of workers, and B be the set of tasks (the contents are just labels).
Let the cost be the distance between an element in A and B (i.e. an entry of D
). The distance between e_k
and anything is 0.
Then, we want to find a perfect matching of A and B (i.e. every worker is matched with a task), such that the cost is minimized. This is the assignment problem.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With