For example, I have this sample data:
d=data.frame(x=c(1,1,1,2,2,3,4,4),y=c(5,6,7,8,7,5,6,5),w=c(1,2,3,4,5,6,7,8))
Which looks like this:
x y w
1 1 5 1
2 1 6 2
3 1 7 3
4 2 8 4
5 2 7 5
6 3 5 6
7 4 6 7
8 4 5 8
x
and y
represent indices from datax
and datay
. w
represents a score from comparing datax[x]
with datay[y]
. I want to maximize the total score (or w
) from d
, where each value of x
is matched to at most one value of y
, and vice versa.
The result should look like this:
x y w
1 2 7 5
2 3 5 6
3 4 6 7
Where the sum of all w
values is maximized, and each x
and each y
show up only once in the result.
How do I set this problem up in the lpSolve::lp
function?
You can use lpSolveAPI to solve your problem. Your stated solution is not quite feasible given your constraints. So let's go with you wanting X's and Y's to not repeat in the solution.
You will need 8 new binary variables. Each variables specifies whether or not that row in d
is getting picked (1) or dropped (0).
Yes, the lpSolveAPI code (below) makes it look more complicated than it really is. This LP formulation (output of lpSolveAPI) should make things clearer:
/* Objective function */
max: +pick_1 +2 pick_2 +3 pick_3 +4 pick_4 +5 pick_5 +6 pick_6 +7 pick_7 +8 pick_8;
/* Constraints */
OneX_1: +pick_1 +pick_2 +pick_3 <= 1;
OneX_2: +pick_4 +pick_5 <= 1;
OneX_4: +pick_7 +pick_8 <= 1;
OneY_5: +pick_1 +pick_6 +pick_8 <= 1;
OneY_6: +pick_2 +pick_7 <= 1;
OneY_7: +pick_3 +pick_5 <= 1;
/* Variable bounds */
pick_1 <= 1;
pick_2 <= 1;
pick_3 <= 1;
pick_4 <= 1;
pick_5 <= 1;
pick_6 <= 1;
pick_7 <= 1;
pick_8 <= 1;
Explanation: The second constraint (OneX_2) simply states that only one of pick_4
or pick_5
can be 1, since the 4th and 5th rows in the data frame d
have X = 2
Note that the formulation above produces an optimal solution that selects 4 rows in the data frame d
> d[c(3,4,6,7),]
x y w
3 1 7 3
4 2 8 4
6 3 5 6
7 4 6 7
The sum of w's is 20, which is better than the solution in the question.
library(lpSolveAPI)
d <- data.frame(x=c(1,1,1,2,2,3,4,4),y=c(5,6,7,8,7,5,6,5),w=c(1,2,3,4,5,6,7,8))
ncol <- 8 #you have eight rows that can be picked or dropped from the solution set
lp_rowpicker <- make.lp(ncol=ncol)
set.type(lp_rowpicker, columns=1:ncol, type = c("binary"))
obj_vals <- d[, "w"]
set.objfn(lp_rowpicker, obj_vals)
lp.control(lp_rowpicker,sense='max')
#Add constraints to limit X values from repeating
add.constraint(lp_rowpicker, xt=c(1,1,1), #xt specifies which rows of the LP
indices=c(1,2,3), rhs=1, type="<=")
add.constraint(lp_rowpicker, xt=c(1,1), #xt specifies which rows of the LP
indices=c(4,5), rhs=1, type="<=")
add.constraint(lp_rowpicker, xt=c(1,1), #xt specifies which rows of the LP
indices=c(7,8), rhs=1, type="<=") #x's in dataframe rows 7 & 8 are both '4'
#Add constraints to limit Y values from repeating
add.constraint(lp_rowpicker, xt=c(1,1,1), #xt specifies which rows of the LP
indices=c(1,6,8), rhs=1, type="<=") #Y's in df rows 1,6 & 8 are all '5'
add.constraint(lp_rowpicker, xt=c(1,1), #xt specifies which rows of the LP
indices=c(2,7), rhs=1, type="<=") #Y's in dataframe rows 2&7 are both '6'
add.constraint(lp_rowpicker, xt=c(1,1), #xt specifies which rows of the LP
indices=c(3,5), rhs=1, type="<=") #y's in dataframe rows 3&5 are both '7'
solve(lp_rowpicker)
get.objective(lp_rowpicker) #20
get.variables(lp_rowpicker)
#[1] 0 0 1 1 0 1 1 0
#This tells you that from d you pick rows: 3,4,6 & 7 in your optimal solution.
#If you want to look at the full formulation:
rownames1 <- paste("OneX", c(1,2,4), sep="_")
rownames2 <- paste("OneY", c(5,6,7), sep="_")
colnames<- paste("pick_",c(1:8), sep="")
dimnames(lp_rowpicker) <- list(c(rownames1, rownames2), colnames)
print(lp_rowpicker)
#write it to a text file
write.lp(lp_rowpicker,filename="max_w.lp")
Hope this gives you an idea of how to use lpSolveAPI to formulate your problem.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With