I have a problem which is similar to a subset sum but with a few modifications:
Subset Sum: Find all ways the following list can sum to 16:
[2,9,40,2,404,12....]
My Problem:
Find all ways you can select n
items (integer, discrete), each which has 3 properties - length
,width
,and height
, such that the combined total of each property is as close to a specific target as possible.
Example:
items = [(92,10,15),(8,18,34),(29,50,110) ...]
l,w,h = [150,200,180]
n = 3
We select 3 items such that the sum of the length, width, and height across all 3 items is as close as possible 150, 200, and 180. This error
can be formulated as a simple distance metric:
error = sum(abs(x-y) for x,y in zip(targets,actuals))
This is a contrived example but the general problem I'm solving is the same, just with more variables and constraints.
One obvious solution is just an exhaustive graph search of some sort, but I expect there are superior approaches, perhaps in the constraint programming domain.
Note:The solution should execute in a few seconds. Finding an approximate solution quickly with low error is more important than finding the global optimum. It doesn't matter if we undershoot or overshoot the targets, which is why the error uses absolute distance.
What is a "low error"? I'm able to get within ~5% of the targets when selecting from 1M items, in < 3 seconds. I would prefer to get that error down if possible, but 1%-5% is acceptable.
Please take into consideration my prior attempts outlined below before recommending an approach.
After a day of experimentation and research, the best approach I've found was also the most simple one. Just repeatedly trying random configurations until I got an acceptable solution (e.g. "Random Restart Hill Climbing"). Simulated annealing works well too, but I haven't noticed a big enough improvement to justify the extra complexity compared to RRHC - however I admittedly did not spend any real effort tuning the parameters (temperature, schedule, etc.).
I should say that I'm fairly happy with this approach so far - it returns a "good enough" answer pretty quickly, is tunable if I want a more exhaustive search vs faster (more vs less trials), and is flexible in what I accept as "good enough" in the sense that I can just modify the "cost" function - in this case a simple distance between the target l,w,h and the current l,w,h. If, for example, I wanted to penalize being off on height more severely than length or width I could simply multiply the (htarget - hactual) * 2
and give more weight to that particular part of the cost.
That said, for educational purposes I want to go through the other approaches I tried, and see if maybe someone can point out where I went wrong in my backtracking approach specifically. Or alternatively, if I didn't miss anything and maybe RRHC is just a better algorithm for this type of problem, that would be nice to know as well.
This is not the first time I've found RRHC to work well for problems where you need a "good enough" solution and where finding the global optimum is "icing on the cake". In my experience it's a hammer for a lot of nails - surprisingly effective given it's simplicity. One extension I might try later is running multiple parallel threads of search - even better.
I run the algorithm:
All constraints good simply means I haven't exceeded the targets. Additionally, at each selection step, I filter the list of furniture so that any items which would exceed the remaining l,w,h are removed.
For example, after 2nd selection we have assigned 30+100,20+50,40+120 = 130,70,160 leaving a remaining l,w,h to allocate of 140-130, 80-70, 300-160 = 10,10,140. Any item which has a l>10, w>10, or h>140 wouled be filtered out before we continue the search.
The problem is that while this filtering will make sure to exclude any items that would be in conflict at the immediate next step, it does not consider that a certain state will eventually become a dead-end at later steps. If the only pieces of furniture left now all have a length of 10, they will pass this filtering step yet we know that the current arrangement cannot possibly lead to a solution because by the fourth step there is no budget left for length - we will have exhausted it on step 3 when we assign 10 leaving zero remaining length.
I can do a form of "forward checking" at every step:
h
remaining, but the maxh
is 60, we know we can never arrive at a solution since 2*60 = 120 < 140.In general - at every step n
we filter the furniture list such that:
I think this would work, and terminate parts of the search space which will not be feasible much higher up in the search tree, but it feels hacky.
Mixed Integer Linear Programming: Since I have to select n
items there is a clear integer constraint, meaning linear solvers and, more generally, convex optimization is out the window. I thought MILP might be a possible approach but after some research, I'm not so sure. For one - I cannot think of a way to formulate this as a MILP. The only knob we have to turn is which pieces we select, and whatever l,w,h properties that piece has is a given. For that reason it seemed more of a search problem than MILP, but that could be due to my inexperience with formulating the problem in the proper form.
Constraint Satisfaction Algorithms: Backtracking search while checking constraints is actually in this family of algorithms. I think my proposed "look ahead" is vaguely reminiscent of backtracking search with the "forward checking" heuristic. There are other approaches looking at arc consistency, picking least/most constraint variables as neighbors, etc... but similar to MILP it wasn't clear to me if this problem could be formulated in such a way to leverage those options.
This is a slight variation of the knapsack problem, which is very common in Linear Programming. Picking things based on some value with some limit on another characteristic.
In this case, we want to use the total "miss distance" for each dimension as the valuation (lower = better). Because we want to consider low misses and high misses, we need to introduce a variable to capture that, and constrain it to be larger than a possible "low miss" (target > sum) and also constrain it to be larger than a "high miss" (sum > target). Then just take the sum of miss in all 3 dimensions.
Many variants are possible. You could switch to "Price is Right" scoring and not allow the dimensional totals to be greater than their limits by adding those constraints, etc.
Anyhow, this is a fairly straightforward MILP. It solves for 1000 boxes almost instantly using the commented-out "GO BIG". (DO NOT try that with enumeration! Choose(2000, 1000) is a large number). The problem becomes more complicated if you don't restrict the number of boxes selected, but that's another story.
Note: As stated in the comment, I need to use a separate instance of the cbc
solver on my machine. If you want to replicate this by installing pulp
, it will install a default copy of cbc
and you should just need the one-liner that I show beneath.
# box selection
from random import randint
import pulp
num_boxes = 10
num_selections = 5
targets = {'l': 200, 'w': 600, 'h': 300}
# GO BIG
# num_boxes = 2000
# num_selections = 1000
# targets = {'l': 40_000, 'w': 120_000, 'h': 60_000}
# Make a list of boxes to pick from
boxes = [{'l': randint(20, 80), 'w': randint(100, 200), 'h': randint(50, 65)} for _ in range(num_boxes)]
# SET UP THE LP
prob = pulp.LpProblem('box_picker', sense=pulp.LpMinimize)
# SETS
D = ['l', 'w', 'h']
B = list(range(num_boxes))
# VARS
pick = pulp.LpVariable.dicts('pick', indices=B, cat=pulp.LpBinary) # 1 if picked, else 0
miss = pulp.LpVariable.dicts('miss', indices=D) # the amount that we missed the target by
# OBJ: Minimize the total miss distance, all dimensions are equally weighted
prob += pulp.lpSum(miss)
# CONSTRAINTS:
# connect the "miss" variable to the selections * dimensions
# Because the "miss" variable represents an absolute values of the difference beteen target and total,
# we must constrain it for positive and negative misses
for dim in D:
prob += miss[dim] >= sum(boxes[box][dim] * pick[box] for box in B) - targets[dim] # pos delta
prob += miss[dim] >= -(sum(boxes[box][dim] * pick[box] for box in B) - targets[dim]) # neg delta
# force correct number of selections
prob += pulp.lpSum(pick) == num_selections
# check it
# print(prob)
# SOLVE IT
cbc_path = '/opt/homebrew/opt/cbc/bin/cbc' # this lis likely not necessary for a normal install
solver = pulp.COIN_CMD(path=cbc_path)
res = prob.solve(solver)
# prob.solve() # <---this is normally OK instead of 3 lines above
# make sure we got optimal result
assert prob.status == pulp.LpStatusOptimal
tot_selections = sum(pick[b].varValue for b in B)
print(f'Selected {tot_selections} Boxes:')
for b in B:
if pick[b].varValue > 0.1: # screen out the zeros (or small numerical errors)
print(boxes[b])
print('\nDimensional "Misses":')
for d in D:
print(d, miss[d].varValue)
Result - Optimal solution found
Objective value: 152.00000000
Enumerated nodes: 0
Total iterations: 0
Time (CPU seconds): 0.00
Time (Wallclock seconds): 0.00
Option for printingOptions changed from normal to all
Total time (CPU seconds): 0.00 (Wallclock seconds): 0.00
Selected 5.0 Boxes:
{'l': 39, 'w': 136, 'h': 60}
{'l': 39, 'w': 140, 'h': 53}
{'l': 64, 'w': 125, 'h': 57}
{'l': 33, 'w': 161, 'h': 57}
{'l': 51, 'w': 156, 'h': 65}
Dimensional "Misses":
l 26.0
w 118.0
h 8.0
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With