Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Algorithm for fairly assigning tasks to workers based on skills

Tags:

algorithm

task

(Before anyone asks, this is not homework.)

I have a set of workers with interests, i.e.:

  • Bob: Java, XML, Ruby

  • Susan: Java, HTML, Python

  • Fred: Python, Ruby

  • Sam: Java, Ruby

  • etc.

(There are actually somewhere in the range of 10-25 "interests" for each worker, and I have around 40-50 workers)

At the same time, I have a very large set of tasks that need to be distributed among the workers. Each task has to be assigned to at least 3 workers, and the workers must match at least one of the tasks' interests:

Task 1: Ruby, XML Task 2: XHTML, Python

and so on. So Bob, Fred, or Sam could get Task 1; Susan or Fred could get Task 2.

This is all stored in a database thusly:

Task
    id integer primary key
    name varchar

TaskInterests
    task_id integer
    interest_id integer

Workers
    id integer primary key
    name varchar
    max_assignments integer

WorkerInterests
    worker_id
    interest_id

Assignments
    task_id
    worker_id
    date_assigned

Each worker has a maximum number of assignments they will do, around 10. Some interests are more rare than others (i.e. only 1 or 2 workers have listed them as a interest), some interests are more common (i.e. half of the workers list them).

The algorithm must:

  • Assign every task to 3 workers (it is assumed that at least 3 of the workers are interested in one of the interests of the task).
  • Assign every worker 1 or more tasks

Ideally, the algorithm will:

  • Assign each worker a number of tasks proportional to their maximum assignments and the total number of tasks. For example, if Susan says she will do 20 tasks and most people will only do 10 tasks and there are 50 workers and 300 tasks, she should be assigned 12 tasks (20/10*(300/50)).
  • Assign a variety of tasks to each worker, so if Susan lists 4 interests she gets tasks that include 4 interests (rather than getting 10 tasks all with the same interest)

The most difficult aspect so far has been dealing with theses issues:

  • tasks having interests with few corresponding workers
  • workers who have few interests, especially
  • workers who have a few interests, for which there are relatively few tasks
like image 955
Jordan Reiter Avatar asked Jan 21 '11 23:01

Jordan Reiter


3 Answers

This problem can be modeled as a Maximum Flow Problem.

In a max-flow problem, you have a directed graph with two special nodes, the source and the sink. The edges in the graph have capacities, and your goal is to assign a flow through the graph from the source to the sink without exceeding any of the edge capacities.

With a (very) carefully crafted graph, we can find an assignment meeting your requirements from the maximum flow.

Let me number the requirements.

Required:

1. Workers are assigned no more than their maximum assignments.
2. Tasks can only be assigned to workers that match one of the task's interests.
3. Every task must be assigned to 3 workers.
4. Every worker must be assigned to at least 1 task.

Optional:

5. Each worker should be assigned a number of tasks proportional to that worker's maximum assignments
6. Each worker should be assigned a variety of tasks.

I will assume that the maximum flow is found using the Edmonds-Karp Algorithm.

Let's first find a graph that meets requirements 1-3.

Picture the graph as 4 columns of nodes, where edges only go from nodes in a column to nodes in the neighboring column to the right.

In the first column we have the source node. In the next column we will have nodes for each of the workers. From the source, there is an edge to each worker with capacity equal to that worker's maximum assignments. This will enforce requirement 1.

In the third column, there is a node for each task. From each worker in the second column there is an edge to each task that that worker is interested in with a capacity of 1 (a worker is interested in a task if the intersection of their interests is non-empty). This will enforce requirement 2. The capacity of 1 will ensure that each worker takes only 1 of the 3 slots for each task.

In the fourth column we have the sink. There is an edge from each task to the sink with capacity 3. This will enforce requirement 3.

Now, we find a maximum flow in this graph using the Edmonds-Karp Algorithm. If this maximum flow is less than 3 * (# of tasks) then there is no assignment meeting requirements 1-3. If not, there is such an assignment and we can find it by examining the final augmented graph. In the augmented graph, if there is an edge from a task to a worker with capacity 1, then that worker is assigned to that task.


Now, we will modify our graph and algorithm to meet the rest of the requirements.

First, let's meet requirement 4. This will require a small change to the algorithm. Initially, set all the capacities from the source to the workers to 1. Find the max-flow in this graph. If the flow is not equal to the number of workers, then there is no assignment meeting requirement 4. Now, in your final residual graph, for each worker the edge from the source to that worker has capacity 0 and the reverse edge has capacity 1. Change these to that worker's maximum assignments - 1 and 0, respectively. Now continue Edmonds-Karp algorithm as before. Basically what we have done is first find an assignment such that each worker is assigned to exactly one task. Then delete the reverse edge from that task so that the worker will always be assigned to at least one task(though it may not be the one assigned to in the first pass).


Now let's meet requirement 5. Strictly speaking, this requirement just means that we divide each worker's maximum assignments by sum of all worker's maximum assignments / number of tasks. This will quite possibly not have a satisfying assignment. But that's ok. Initialize our graph with these new maximum assignments. Run Edmonds-Karp. If it finds a flow that saturates the edges from tasks to sink, we are done. Otherwise we can increment the capacities from sink to workers in the residual graph and continue running Edmonds-Karp. Repeat until we saturate the edges into the sink. Don't increment the capacities so much that a worker is assigned too many tasks. Also, technically, the increment for each worker should be proportional to that worker's maximum assignments. These are both easy to do.


Finally let's meet requirement 6. This one is a bit tricky. First, add a column between workers and tasks and remove all edges from workers to tasks. In this new column, for each worker add a node for each of that workers interests. From each of these new nodes, add an edge to each task with a matching interest with capacity 1. Add an edge from each worker to each of its interest nodes with capacity 1. Now, a flow in this graph would enforce that if a worker is assigned to n tasks, then the intersection of the union of those task's interests with that worker's interests has size at least n. Again, it is possible that there is a satisfying assignment without this assignment, but there is not one with it. We can handle this the same as requirement 5: run Edmonds-Karp to completion, if no satisfying assignment, increment the capacities from workers to their interest nodes and repeat.

Note that in this modified graph we no longer satisfy requirement 3, as a single worker may be assigned to multiple/all slots of a task if the intersection of their interests has size greater than 1. We can fix that. Add a new column of nodes between the interest nodes and the task nodes and delete the edges between those nodes. For each employee, in the new column insert a node for each task (so each employee has its own node for each task). From these new nodes, to their corresponding task to the right, add an edge with capacity 1. From each worker's interests node to that worker's task nodes, add an edge with capacity 1 from each interest to each task that matches.

-

EDIT: Let me try to clarify this a little. Let -(n)-> be an edge with n capacity.

Previously we had worker-(1)->task for each worker-task pair with a matching interest. Now we have worker-(k)->local interest-(1)->local task-(1)->global task. Now, you can think of a task being matched to a worker-interest pair. The first edge says that for a worker, each of its interests can be matched to k tasks. The second edge says that each of a worker's interests can only be matched once to each job. The third edge says that each task can only be assigned once to each worker. Note that you could push multiple flow from the worker to a local task (equal to the size of the intersection of their interests) but only 1 flow from the worker to the global task node due to the third edge.

-

Also note that we can't really mix this incrementing with the one for requirement 5 correctly. However, we can run the whole algorithm once for each capacity {1,2,...,r} for worker->interest edges. We then need a way to rank the assignments. That is, as we relax requirement 5 we can better meet requirement 6 and vice versa. However, there is another approach that I prefer for relaxing these constraints.


A better approach to requirement relaxation (inspired-by/taken-from templatetypedef)

When we want to be able to relax multiple requirements (e.g. 5 and 6), we can model it as a min-cost max-flow problem. This may be simpler than the incremental search that I described above.

For example, for requirement 5, set all the edge costs to 0. We have the initial edge from the source to the worker with the capacity equal to worker's maximum assignments / (sum of all worker's maximum assignments / number of tasks) and with cost 0. Then you can add another edge with the remaining capacity for that worker and cost 1. Another possibility would be to use some sort of progressive cost such that as you add tasks to a worker the cost to add another task to that user goes up. E.g. you could instead split a worker's remaining capacity up into individual edges with costs 1,2,3,4,....

A similar thing could then be done between the worker nodes and the local-interest nodes for requirement 6. The weighting would need to be balanced to reflect the relative importance of the different requirements.

This method is also sufficient to enforce requirement 4. Also, the costs for requirement 5 should probably be made such that they are proportional to a worker's maximum assignments. Then assigning 1 extra task to a worker with max 100 would not cost as much as assigning an extra to a worker with max 2.


Complexity Analysis

Let n = # of employees, m = # of tasks, k = max interests for a single task/worker, l = # of interests, j = maximum of maximum assignments.

Requirement 3 implies that n = O(m). Let's also assume that l = O(m) and j = O(m).

In the smaller graph (before the change for req. 6), the graph has n + m + 2 = O(m) vertices and at most n + m + k*min(n, m) = O(km) edges.

After the change it has 2 + n + n * l + n * m + m = O(nm) vertices and n + k * n + k * m * n + n * m + m = O(kmn) edges (technically we may need j * n + j * l more nodes and edges so that there are not multiple edges from one node to another, but this wouldn't change the asymptotic bound). Also note that no edge need have capacity > j.

Using the min-cost max-flow formulation, we can find a solution in O(VEBlogV) where V = # vertices, E = # edges, and B = max capacity on a single edge. In our case this gives O(kjn^2m^2log(nm)).

like image 139
Chris Hopman Avatar answered Sep 27 '22 02:09

Chris Hopman


For problems where finding a direct solution is difficult it can be a good idea to use an approximation algorithm, an evaulation function and a method to improve the solution. There are a variety of approaches, such as genetic algorithms and simulated annealing.

The basic idea is to use some sort of simple algorithm (such as a greedy algorithm) to get something that is vaguely usable and make random modifications, keeping those modifications that improve the evaluation score and discarding those that make it worse.

With genetic algorithms a group of for example 100 random solutions is generated and scored and the best are kept and "bred" to produce a new generation of solutions with characteristics similar to the previous generations, but with some random mutations.

For simulated annealing the probablility of a slightly worse solutions being accepted is high initially, but decreases over time. This reduces the risk of getting stuck at a local optimium early on.

like image 44
Mark Byers Avatar answered Sep 24 '22 02:09

Mark Byers


Try mapping your task to the stable marriage problem. Tasks become prospective wives `, and your staff become suitors.

You might want to add some extra algorithm for assigning preferences of each task to the staff, and vice-versa - you could assign some ideal proficiency neccessary for the components of each task, and then allow your staff to rank each task. You could assign a proficiency for each component that each staff member posses and use that to get each tasks preference in staff members.

Once you have the preferences then run the algorithm, post the results, then allow people to apply in pairs to you to swap assignments - after all this is a people problem and people work better when they have a degree of control.

like image 30
Paddy3118 Avatar answered Sep 23 '22 02:09

Paddy3118