Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Need pairing algorithm - based on Hungarian?

Hungarian or Kuhn-Munkres algorithm (good description here) pairs objects from two sets (of n and m objects respectively, n>=m) so that the overall "difference" (or "cost" of assignment) between paired objects be minimal. One feature of the algo doesn't suit me however: it does only exhaustive pairing, in the sense that it will pair all m objects with some of n objects. Instead of this, I'd want to be able to create arbitrary number k of pairs (k<=m) with overall cost minimal. For example, there is a 50x30 input cost matrix; Kuhn-Munkres will optimally create but all 30 pairs. While I need just 20 pairs to be created such optimally.

Can there be any modification of Hungarian algorithm allowing for this, or maybe a totally another algo to do it? I appreciate your answers highly.

like image 635
ttnphns Avatar asked Jul 29 '11 08:07

ttnphns


1 Answers

Here are a few ideas to think about:

1) Suppose you write down your cost matrix with n columns and m rows. If n is greater than m you add padding rows with constant large cost to make it square. A minimum cost assignment of rows and columns will now discard some columns by matching them to padding rows. Suppose you now add a padding column with very low cost for the ordinary rows and the constant large cost for the padding columns. The solution will now match one of the proper rows to this column, to take advantage of the very low cost. This reduces the number of rows that match to something sensible. I think if you add m-k such columns you will end up with a minimum cost matching that really assigns only k of the rows.

Here is an example of pairing 3 with 3 in 5x5, assuming ?
marks problem-specific values > 0 but < 100 (you may 
need more extreme values than 0 and 100 to force the sort of
solution you want depending on what your data values are).

?   ?   ?   ?   ?   0   0
?   ?   ?   ?   ?   0   0
?   ?   ?   ?   ?   0   0
?   ?   ?   ?   ?   0   0
?   ?   ?   ?   ?   0   0
100 100 100 100 100 100 100
100 100 100 100 100 100 100

I expect that an optimal solution will use two 0s from the far right and two 100s from the bottom rows. The remaining cells are a 3 x 3 matching within the square of ?s

OK - here is a proof that adding columns and then rows as above produces the sort of matching you want:

Suppose you take a cost matrix with values 0 < x < 100 and add a border of s columns and rows of 0s and 100s as above, then solve it as an assignment problem. Draw two lines at the border of the 0s and 100s, extending them to cut the square into four regions, where the region at the top left is the original matrix. If the assignment algorithm didn't choose any of the cells in the bottom right region then it chose s cells in the top right region (to pick the s rightmost columns), so s rows in the orgininal cost matrix in the top left region are paired with cells in a zero column. The other rows in the top region must be paired with a non-zero column, so you have a matching in the original region that leaves s rows, and so s columns, unpaired (that is, paired with a zero cell).

Is it possible that the assigment solution has any cells in the s x s lower right region chosen? Consider any such assignment. To prove that at least one cell in the upper left region must be chosen, suppose none are chosen. Then we must somehow choose a cell from each of the top n rows, presumably by picking cells from the top right region. Each such cell must be in a separate column, but there are only s columns in the top right region, which won't be enough because we need only one column for each matching we want to skip, and we have used one column in this region already to fill in a cell in the lower right region. So suppose the solution chooses at least one cell in the original upper left region and at least one cell in the lower right region. Pick the two other cells that make this into four corners of a square. These cells cannot be chosen. If we choose those cells instead of the two that are currently chosen, we get a different solution. The two new cells are a 0 cell from the top right and a 100 cell from the bottom left. They would replace a 100 cell from the bottom right and a cell of value greater than zero in the main matrix. So this would make our supposed solution better, so any solution that contains a cell in the bottom right region is not a best solution, and the assignment algorithm will not return it to us.

So this trick of adding columns of 0s and then rows of large values will produce an assignment algorithm solution that does omits one matching from the original solution for each (row, column) added.

2) The assignment problem is a special case of the http://en.wikipedia.org/wiki/Minimum-cost_flow_problem. I think you want a minimum cost flow that transfers k units from rows to columns, so you could try solving it like this.

3) The minimum cost flow problem is a special case of linear programming. I think you could write down a linear program to assign numbers in the range [0,1] to cells of the matrix such that each row and each column sums up to no more than 1 and the total of all the cells is k. The objective function is then the number in each cell times its cost.

like image 177
mcdowella Avatar answered Oct 27 '22 02:10

mcdowella