Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Is there a standard algorithm to balance overlapping objects into buckets?

I have a bunch of users, with a given start and end time, e.g.:

{ Name = "Peter", StartTime = "10:30", EndTime = "11:00" },
{ Name = "Dana", StartTime = "11:00", EndTime = "12:30" },
{ Name = "Raymond", StartTime = "10:30", EndTime = "14:00" },
{ Name = "Egon", StartTime = "12:00", EndTime = "13:00" },
{ Name = "Winston", StartTime = "10:00", EndTime = "12:00" }

I want to put them into buckets, based on times that they overlap (based on a configurable threshold, e.g., they need to overlap at least half an hour). I want buckets to be ideally 4 items big, but any range from 2-5 is acceptable.

In the example above, no 4 people match, so I'd have a bucket of 3 (Peter, Raymond, Winston) and one of 2 (Dana, Egon).

I've prototyped an algorithm that seems to rely on chance rather than science:

  1. Order the List by StartTime
  2. Create an empty bucket
  3. Pick the first user from the List
  4. Check that user against all users in the bucket
  5. If that user overlaps with everyone in the bucket, put that person in it and remove it from the list
  6. If the bucket has the ideal size (4) or if I'm looping and checking the same user more than three times, close the bucket and create a new, empty one

This works well for the first few buckets, but leads to buckets with only 2 people that could be combined better.

I could change the algorithm to remove all ideal buckets from the list and reshuffle and try some more, but I feel that this should be a common problem - it's like shift assignments for workers, or maybe the knapsack problem.

Does anyone know a standard algorithm for this type of problem?

(Tagged combinatorics because I think this is the area of math it applies, correct me if wrong)

like image 300
Michael Stum Avatar asked Jun 26 '15 23:06

Michael Stum


1 Answers

tl;dr: dynamic programming for the win (O(sort(n)) time).

First, note that bucketing contiguously in start-time order is fine.

Proposition (defragging): Let a, b, c, d be distinct users such that StartTime(a) ≤ StartTime(b) ≤ StartTime(c) ≤ StartTime(d). If X and Y are valid buckets such that a, c ∈ X and b, d ∈ Y, then X - {c} ∪ {b} and Y - {a} ∪ {d} are valid buckets as well.

I only know how to prove this by tedious case analysis (omitted).

The upshot is, you can pretend as though you're breaking a paragraph into lines, where the “paragraph“ is the list of users in start-time order, and each “line” is a bucket. There is an algorithm due to Knuth and Plass for line breaking optimally, where the penalty for a given line is a more or less arbitrary function. You could make buckets of 4 cost 0, buckets of 3 cost 1, buckets of 2 cost 2, and buckets of 1 cost 100, for example.

like image 121
David Eisenstat Avatar answered Sep 18 '22 00:09

David Eisenstat