Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Iterating to figure out combinations of variable number of elements

I'm making a program that uses dynamic programming to decide how to distribute some files (Movies) among DVDs so that it uses the least number of DVDs.

After much thought I decided that a good way to do it is to look at every possible combination of movies that is less than 4.38 GB (The actual size of a DVD ) and pick the largest one (i.e. the one that wastes the least space) and remove the movies in the most efficient combination and repeat until it run out of movies.

The problem is that I don't know how to loop so I can figure out every possible combination, given that movies vary in size, so a specific number of nested loops cannot be used.

pseudo-code:

Some kind of loop:
    best_combination=[]
    best_combination_size=0
    if current_combination<4.38 and current_combination>best_combination_size:
        best_combination=current_combination
        best_combination_size=best_combination_size
print(best_combination)
delete best_combination from list_of_movies

first time to post a question..so go easy on me guys!! Thanks in advance

P.S. I figured out a way to do it using Dijkstra's which I think would be fast but not memory friendly. If anybody is interested i would gladly discuss it.

like image 390
Mazen Sharkawy Avatar asked Nov 08 '22 08:11

Mazen Sharkawy


1 Answers

You should really stick to common bin-packing heuristics. The wikipedia article gives a good overview of approaches including links to problem-tailored exact approaches. But always keep in mind: it's an np-complete problem!

I will show you some example supporting my hint, that you should stick to heuristics.

The following python-code:

  • creates parameterized random-problems (normal-distribution on multiple means/stds; acceptance-sampling to make sure that no movie is bigger than a DVD)
  • uses some random binpacking-library which implements some greedy-heuristic (i didn't try or test this lib before; so no guarantees!; no idea which heuristic is used)
  • uses a naive mixed-integer programming implementation (which is solved by a commercial solver; open-source solvers like cbc struggle, but might be used for good approximate solutions)

Code

import numpy as np
from cvxpy import *
from time import time

""" Generate some test-data """
np.random.seed(1)
N = 150  # movies
means = [700, 1400, 4300]
stds = [100, 300, 500]
DVD_SIZE = 4400

movies = []
for movie in range(N):
    while True:
        random_mean_index = np.random.randint(low=0, high=len(means))
        random_size = np.random.randn() * stds[random_mean_index] + means[random_mean_index]
        if random_size <= DVD_SIZE:
            movies.append(random_size)
            break

""" HEURISTIC SOLUTION """
import binpacking
start = time()
bins = binpacking.to_constant_volume(movies, DVD_SIZE)
end = time()
print('Heuristic solution: ')
for b in bins:
    print(b)
print('used bins: ', len(bins))
print('used time (seconds): ', end-start)

""" Preprocessing """
movie_sizes_sorted = sorted(movies)
max_movies_per_dvd = 0
occupied = 0
for i in range(N):
    if occupied + movie_sizes_sorted[i] <= DVD_SIZE:
        max_movies_per_dvd += 1
        occupied += movie_sizes_sorted[i]
    else:
        break

""" Solve problem """
# Variables
X = Bool(N, N)  # N * number-DVDS
I = Bool(N)     # indicator: DVD used

# Constraints
constraints = []
# (1) DVDs not overfilled
for dvd in range(N):
    constraints.append(sum_entries(mul_elemwise(movies, X[:, dvd])) <= DVD_SIZE)
# (2) All movies distributed exactly once
for movie in range(N):
    constraints.append(sum_entries(X[movie, :]) == 1)
# (3) Indicators
for dvd in range(N):
    constraints.append(sum_entries(X[:, dvd]) <= I[dvd] * (max_movies_per_dvd + 1))

# Objective
objective = Minimize(sum_entries(I))

# Problem
problem = Problem(objective, constraints)
start = time()
problem.solve(solver=GUROBI, MIPFocus=1, verbose=True)
#problem.solve(solver=CBC, CliqueCuts=True)#, GomoryCuts=True, KnapsackCuts=True, verbose=True)#, GomoryCuts=True, MIRCuts=True, ProbingCuts=True,
              #CliqueCuts=True, FlowCoverCuts=True, LiftProjectCuts=True,
              #verbose=True)
end = time()

""" Print solution """
for dvd in range(N):
    movies_ = []
    for movie in range(N):
        if np.isclose(X.value[movie, dvd], 1):
            movies_.append(movies[movie])
    if movies_:
        print('DVD')
        for movie in movies_:
            print('     movie with size: ', movie)

print('Distributed ', N, ' movies to ', int(objective.value), ' dvds')
print('Optimizatio took (seconds): ', end-start)

Partial Output

Heuristic solution:
-------------------
('used bins: ', 60)
('used time (seconds): ', 0.0045168399810791016)

MIP-approach:
-------------
Root relaxation: objective 2.142857e+01, 1921 iterations, 0.10 seconds

    Nodes    |    Current Node    |     Objective Bounds      |     Work
 Expl Unexpl |  Obj  Depth IntInf | Incumbent    BestBd   Gap | It/Node Time

     0     0   21.42857    0  120  106.00000   21.42857  79.8%     -    0s
H    0     0                      68.0000000   21.42857  68.5%     -    0s
H    0     0                      63.0000000   21.42857  66.0%     -    0s
     0     0   21.42857    0  250   63.00000   21.42857  66.0%     -    1s
H    0     0                      62.0000000   21.42857  65.4%     -    2s
     0     0   21.42857    0  256   62.00000   21.42857  65.4%     -    2s
     0     0   21.42857    0  304   62.00000   21.42857  65.4%     -    2s
     0     0   21.42857    0  109   62.00000   21.42857  65.4%     -    3s
     0     2   21.42857    0  108   62.00000   21.42857  65.4%     -    4s
    40     2   27.61568   20   93   62.00000   27.61568  55.5%   110    5s
H  156    10                      61.0000000   58.00000  4.92%  55.3    8s
   262     4   59.00000   84   61   61.00000   59.00000  3.28%  44.2   10s
   413    81 infeasible  110        61.00000   59.00000  3.28%  37.2   15s
H  417    78                      60.0000000   59.00000  1.67%  36.9   15s
  1834  1212   59.00000  232   40   60.00000   59.00000  1.67%  25.7   20s
...
...
 57011 44660 infeasible  520        60.00000   59.00000  1.67%  27.1  456s
 57337 44972   59.00000  527   34   60.00000   59.00000  1.67%  27.1  460s
 58445 45817   59.00000   80   94   60.00000   59.00000  1.67%  26.9  466s
 59387 46592   59.00000  340   65   60.00000   59.00000  1.67%  26.8  472s

Analysis

Some observations regarding the example above:

  • the heuristic obtains some solution of value 60 instantly
  • the commercial-solver needs more time but also find a solution of value 60 (15 secs)
    • also tries to find a better solution or proof that there is no one (MIP-solvers are complete = find optimal solution or proof there is none given infinite time!)
    • no progress for some time!
    • but: we got proof, that there is at best a solution of size 59
    • = maybe you will save one DVD by solving the problem optimally; but it's hard to find a solution and we don't know if this solution exists (yet)!

Remarks

  • The observations above are heavily dependent on data-statistics
  • It's easy to try other problems (maybe smaller) where the commercial MIP-solver finds a solution which uses 1 less DVD (e.g. 49 vs. 50)
    • It's not worth it (remember: open-source solvers are struggling even more)
    • The formulation is very simple and not tuned at all (don't blame only the solvers)
  • There are exact algorithms (which might be much more complex to implement) which could be appropriate

Conclusion

Heuristics are very easy to implement and provide very good solutions in general. Most of these also come with very good theoretical guarantees (e.g. at most 11/9 opt + 1 #DVDs compared to optimal solution are used = first fit decreasing heuristic). Despite the fact, that i'm keen on optimization in general, i probably would use the heuristics-approach here.

The general problem is also very popular, so that there should exist some good library for this problem in many programming-languages!

like image 171
sascha Avatar answered Dec 03 '22 19:12

sascha