Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Algorithm - Ideal Distribution Between Multiple Workshops & Timeframes

I'm looking for an algorithm in order to solve following problem:

Let's say I'm organizing a course with 300 attendees and 6 workshops divided over 3 Timeframes.

Every attendee has to register on a website, and choose which 3 workshops he would like to attend, together with 2 reserve selections.

Workshops are randomly divided over the timeframes, mostly the same workshop occurs in multiple timeframes. It doesn't matter in which timeframe the attendee follows the workshop.

The algorithm needs to generate the ideal spread of attendees over the different timeframes so they all have gotten there favorite workshops as much as possible...

Which technology can I use to generate this spread? Can I do it with ActionScript or PHP? Is there anyone with a good example?

Thanks a lot for your help!

like image 205
gert789 Avatar asked Jan 17 '12 16:01

gert789


1 Answers

In principle, for optimization problems like this you have the choice between exact solving methods like Linear Programming and Constraint Programming, and approximate methods like heuristics or any flavour of Local Search (which includes methods like Genetic Algorithms or Simulated Annealing).

For the problem size that you mentioned, I would definitely use an exact method, since only these guarantee that you find the global optimum. With approximate methods, you can only be sure that you found the global optimum if the cost measure has a value of zero (e.g., no constraint violations).

Version 1: Integer Programming

Your problem can be seen as a variant of Bin Packing. For this type of problem, Mixed Integer Programming (a variant of Linear Programming) is the best choice in my opinion. You will need a MIP solver, since you don't want to program one yourself. The probably best free one can be found in the COIN-OR library: CLP/CBC. It's okay-ish for small MIP problems, but may run into troubles with larger ones. For pure LP problems, it's quite good, but for your particular problem, you need integral decision variables, hence MIP. For industrial-sized MIP problems, you need a commercial solver. Take your pick of CPLEX, Xpress, or Gurobi. They are all excellent.

The problem can be modelled thus:

  • for each combination of attendee and workshop, you introduce a binary decision variable. The variable will be one if the attendee visits the workshop. In your example, there will be 1800 such variables.

  • for each attendee, the sum of the decision variables will be the number of visited workshops. In your example, this is three.

  • for each attendee, the sum of overlapping workshops is at most 1.

  • a unit cost is induced if an attendee must visit a reserve choice

  • decision variables for courses that an attendee has not selected are set to zero

The objective is then to minimize the overall cost.

Here's a quickly written example code in ECLiPSe (a variant of Prolog):

:- lib(eplex).
:- lib(random).
:- lib(listut).

:- local struct(attendee(choices, reserve, workshops)).

generate_attendees(NA, NW, NC, NR, Atts) :-
    seed(1), % always generate the same set
    ( for(I, 1, NW), foreach(I, WD) do true ),
    (
        for(_I, 1, NA),
        foreach(Att, Atts),
        param(NC, NR, WD)
    do
        Att = attendee{},
        generate_choices(Att, NC, NR, WD)
    ).

generate_choices(Att, NC, NR, WD) :-
    (
        for(_I, 1, NC),
        fromto(WD, DI, DO, RD),
        foreach(C, Choices)
    do
        select_course(DI, C, DO)
    ),
    (
        for(_I, 1, NR),
        fromto(RD, DI, DO, _),
        foreach(R, Reserve)
    do
        select_course(DI, R, DO)
    ),
    Att = attendee{choices:Choices, reserve:Reserve}.

select_course(L, E, RL) :-
    length(L, LL),
    random(R),
    N is R mod LL,
    nth0(N, L, E, RL).


solve_with_mip(Atts, NA, NW, NC, Excl) :-
    decision_vars(NA, NW, Decs),
    workshop_visits(Decs, NA, NW, NC),
    workshop_choices(Atts, Decs, NA, NW, Cost),
    workshop_exclusions(Decs, NA, Excl),
    % set up solver and solve
    eplex:eplex_solver_setup(min(Cost), Cost, [], []),
    eplex:eplex_solve(Result),
    printf("Found solution with cost: %w%n", [Result]),
    % retrieve solution
    eplex:eplex_get(vars,           Vars),
    eplex:eplex_get(typed_solution, Vals),
    Vars = Vals,
    retrieve_solution(Atts, Decs, NA, NW).


% make array of decision variables
decision_vars(NA, NW, Decs):-
    dim(Decs, [NA,NW]),
    (
        multifor(Idx, 1, [NA,NW]),
        foreach(D, Ds),
        param(Decs)
    do
        subscript(Decs, Idx, D),
        eplex:(D $>= 0),
        eplex:(D $=< 1)
    ),
    eplex:integers(Ds).


% each attendee visits NC workshops
workshop_visits(Decs, NA, NW, NC) :-
    (
        for(AIdx, 1, NA),
        param(Decs, NW, NC)
    do
        (
            for(WIdx, 1, NW),
            fromto(0, R, D+R, Sum),
            param(AIdx, Decs)
        do
            subscript(Decs, [AIdx, WIdx], D)
        ),
        eplex:(Sum $= NC)
    ).


% each attendee must not visit a workshop not in
%   her list of choices or reserve choices
% choosing a reserve workshop induces a unit cost
workshop_choices(Atts, Decs, NA, NW, Cost):-
    (
        for(AIdx, 1, NA),
        foreach(Att, Atts),
        fromto(0, CI, CO, Cost),
        param(Decs, NW)
    do
        Att = attendee{choices:Cs, reserve:Rs},
        (
            for(WIdx, 1, NW),
            fromto(CI, ACI, ACO, CO),
            param(Decs, AIdx, Cs, Rs)
        do
            (
                memberchk(WIdx, Cs)
            ->
                % choices do not induce cost
                ACO = ACI
            ;
                memberchk(WIdx, Rs)
            ->
                % reserves induce unit cost
                % (if decision variable is 1)
                subscript(Decs, [AIdx, WIdx], D),
                ACO = ACI + D
            ;
                % other workshops must not be chosen
                subscript(Decs, [AIdx, WIdx], D),
                eplex:(D $= 0),
                ACO = ACI
            )
        )
    ).


% some workshops overlap, so exclude each other
workshop_exclusions(Decs, NA, Excl) :-
    (
        foreach(W1-W2, Excl),
        param(Decs, NA)
    do
        (
            for(AIdx, 1, NA),
            param(Decs, W1, W2)
        do
            subscript(Decs, [AIdx, W1], D1),
            subscript(Decs, [AIdx, W2], D2),
            eplex:(D1+D2 $=< 1)
        )
    ).


% retrieve workshopnumbers for attendees
retrieve_solution(Atts, Decs, NA, NW) :-
    (
        for(AIdx, 1, NA),
        foreach(Att, Atts),
        param(Decs, NW)
    do
        (
            for(WIdx, 1, NW),
            fromto([], WI, WO, Ws),
            param(Decs, AIdx)
        do
            subscript(Decs, [AIdx, WIdx], D),
            ( D == 1 -> WO = [WIdx|WI] ; WO = WI )
        ),
        Att = attendee{workshops:Ws}
    ).


test(Atts) :-
    NA = 300,
    NW = 6,
    NC = 3,
    NR = 2,
    % list of exclusions
    Excl = [1-2, 1-3, 3-4, 5-6],
    generate_attendees(NA, NW, NC, NR, Atts),
    cputime(T1),
    solve_with_mip(Atts, NA, NW, NC, Excl),
    cputime(T2),
    D1 is T2-T1,
    printf("Found solution with MIP in %w seconds.%n", [D1]).

I have generated attendees and their choices randomly. The exclusion list is hard-coded. Here is the output generated for a run:

?- test(Atts).
Found solution with cost: 219.0
Found solution with MIP in 0.119999999999997 seconds.
Atts = [attendee([2, 3, 4], [5, 6], [6, 3, 2]), attendee(...), ...]
Yes (0.12s cpu)

I.e., in the solution, 219 times an attendee was placed in a reserve choice. Note that this is purely due to the overlap exclusion constraints, since there are no capacity constraints on the workshop sizes in the model. The selected workshops for the first attendee are 2, 3, and 6. The first choice of [2,3,4] was infeasible, since workshops 3 and 4 overlap. (I have edited out the remaining attendees from the answer)

For this test, I have used the free CLP/CBC solver from the COIN-OR library, which is included in the ECLiPSe distribution. COIN-OR also offers API libraries for C++ and Python.

Version 2: Constraint Logic Programming

Here is a second version, this time using Constraint Logic Programming. Constraint Programming is another exact solution method. Here, I use a different model:

  • for each attendee, construct a list of three variables. These variables denote the assigned workshops, and hence have the workshop numbers as domain. All three variables must have different values.

  • in order to break symmetries, I enforce that the variables must be increasing in their order.

  • the unwanted workshops are removed from the domains.

  • binding the variables to reserve choices induces unit costs

  • choosing a workshop for one of the variables removes any overlapping workshop from the domain of the other variables.

The key for successful Constraint Programming is selecting a clever labeling strategy, where the variables are bound to values. Since in this example, there are no capacity constraints on the workshops, one can simply choose preferred workshops until the domains contain only reserve workshops (due to the exclusion constraints). However, the value ordering is crucial here: one must start with the workshops with the fewest overlap.

If this is done, then no optimization will be necessary: the first solution will be optimal (thanks to the lack of capacity constraints). Otherwise, one will find a solution that is close to optimal, but then will have to traverse a huge search tree to find a better solution.

Here's the code, again in ECLiPSe:

:- lib(ic).
:- lib(random).
:- lib(lists).
:- lib(listut).

:- local struct(attendee(choices, reserve, workshops)).

solve_with_clp(Atts, NA, NW, NC, Excl) :-
    decision_vars_clp(NA, NW, NC, Decs),
    workshop_choices_clp(Atts, Decs, NA, NW, NC, CostSum),
    workshop_exclusions_clp(Decs, NA, NW, Excl, BestOrder),
    % solve
    Cost #= eval(CostSum),
    % order for labeling worskhops
    % start with workshop with fewest exclusions
    % should be computed!
    label(Decs, Atts, BestOrder),
    printf("Found solution with cost: %w%n", [Cost]),
    % retrieve solution
    retrieve_solution_clp(Atts, Decs, NA).

% search solution
label(Decs, Atts, BestOrder) :-
    (
        foreacharg(A, Decs),
        foreach(Att, Atts),
        param(BestOrder)
    do
        Att = attendee{choices:Cs, reserve:Rs},
        label_att(A, Cs, Rs, BestOrder)
    ).

label_att(A, Cs, Rs, BestOrder) :-
    (
        foreacharg(C, A),
        param(Cs, Rs, BestOrder)
    do
        (
            member(V, BestOrder),
            memberchk(V, Cs)
        ;
            member(V, BestOrder),
            memberchk(V, Rs)
        ),
        C #= V
    ).


% make array of decision variables
% for each attendee, one variable for every visited course is created
decision_vars_clp(NA, NW, NC, Decs):-
    dim(Decs, [NA,NC]),
    (
        multifor(Idx, 1, [NA,NC]),
        foreach(D, Ds),
        param(Decs)
    do
        subscript(Decs, Idx, D)
    ),
    Ds #:: 1..NW,
    % for each attendee, each variable must have a different value
    (
        for(AIdx, 1, NA),
        param(Decs, NC)
    do
        (
            for(CIdx, 1, NC),
            foreach(C, Cs),
            param(Decs, AIdx)
        do
            subscript(Decs, [AIdx,CIdx], C)
        ),
        alldifferent(Cs),
        % break symmetry by requiring ascending order
        Cs = [H|T],
        (
            foreach(C, T),
            fromto(H, L, C, _)
        do
            C #> L
        )
    ).


% each attendee must not visit a workshop not in
%   her list of choices or reserve choices
% choosing a reserve workshop induces a unit cost
workshop_choices_clp(Atts, Decs, NA, NW, NC, Cost):-
    (
        for(AIdx, 1, NA),
        foreach(Att, Atts),
        fromto(0, CI, CO, Cost),
        param(Decs, NW, NC)
    do
        Att = attendee{choices:Cs, reserve:Rs},
        % make list of costs
        functor(RCost, c, NW),
        (
            for(I, 1, NW),
            param(Rs, RCost)
        do
            ( memberchk(I, Rs) -> arg(I, RCost, 1) ; arg(I, RCost, 0) )
        ),
        RCost =.. [_|RCL],
        % remove unwanted workshops
        (
            for(CIdx, 1, NC),
            param(Decs, AIdx, Cs, Rs, NW)
        do
            subscript(Decs, [AIdx, CIdx], C),
            (
                for(I, 1, NW),
                param(Cs, Rs, C)
            do
                (
                    ( memberchk(I, Cs) ; memberchk(I, Rs) )
                ->
                    true
                ;
                    C #\= I
                )
            )
        ),
        % add costs for workshops
        (
            for(CIdx, 1, NC),
            fromto(CI, ACI, ACO, CO),
            param(Decs, AIdx, RCL)
        do
            subscript(Decs, [AIdx, CIdx], C),
            element(C, RCL, CCost),
            ACO = ACI+CCost
        )
    ).


% some workshops overlap, so exclude each other
%   assumption: W1 < W2
% also compute best order to label workshops:
%   start with lowest number of overlaps
workshop_exclusions_clp(Decs, NA, NW, Excl, BestOrder) :-
    (
        foreach(W1-W2, Excl),
        param(Decs, NA)
    do
        (
            for(AIdx, 1, NA),
            param(Decs, W1, W2)
        do
            subscript(Decs, [AIdx], ACs),
            ACs =.. [_|ACList],
            (
                fromto(ACList, [H|T], T, [_]),
                param(W1, W2)
            do
                (
                    foreach(N, T),
                    param(H, W1, W2)
                do
                    ( H #= W1 => N #\= W2 ),
                    ( N #= W2 => H #\= W1 )
                )
            )
        )
    ),
    % compute best order for labeling workshops
    (
        for(W, 1, NW),
        foreach(C-W, CWs),
        param(Excl)
    do 
        (
            foreach(W1-W2, Excl),
            fromto(0, I, O, C),
            param(W)
        do
            ( memberchk(W, [W1,W2]) -> O is I+1 ; O = I )
        )
    ),
    sort(CWs, SCWs),
    ( foreach(_-W, SCWs), foreach(W, BestOrder) do true ).


% retrieve workshop numbers for attendees
retrieve_solution_clp(Atts, Decs, NA) :-
    (
        for(AIdx, 1, NA),
        foreach(Att, Atts),
        param(Decs)
    do
        subscript(Decs, [AIdx], AD),
        AD =.. [_|Ws],
        Att = attendee{workshops:Ws}
    ).


test(Atts1) :-
    NA = 300,
    NW = 6,
    NC = 3,
    NR = 2,
    % list of exclusions
    Excl = [1-2, 1-3, 3-4, 5-6],
    generate_attendees(NA, NW, NC, NR, Atts1),
    cputime(T1),
    solve_with_clp(Atts1, NA, NW, NC, Excl),
    cputime(T2),
    D is T2-T1,
    printf("Found solution with CLP in %w seconds.%n", [D]).

Note that the problem generation predicates are the same as in the MIP solution. Here's the output for one run:

?- test(A).
Found solution with cost: 219
Found solution with CLP in 0.330000000000002 seconds.
A = [attendee([2, 3, 4], [5, 6], [2, 4, 5]), ...]
Yes (0.34s cpu, solution 1, maybe more)

As you can see, it's somewhat slower than the MIP solution. Also, the actual solution is slightly different, although it has the same cost.

Which version should you choose? It depends on what further constraints you expect to add. If it's capacity constraints, use MIP. If there it's more complicated constraints like scheduling constraints, then CLP will be better. With a system like ECLiPSe you could also construct hybrid algorithms.

like image 108
twinterer Avatar answered Sep 22 '22 18:09

twinterer