Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Find the smallest set of overlapping jobs

A friend gave me a puzzle that he says can be solved in better than O(n^3) time.

Given a set of n jobs that each have a set start time and end time (overlaps are very possible), find the smallest subset that for every job either includes that job or includes a job that has overlap with that job.

I'm pretty sure that the optimal solution is to pick the job with the most unmarked overlap, add it to the solution set, then mark it, and its overlap. And repeat until all jobs are marked.
Figuring out which job has the most unmarked overlappers is a simple adjacency matrix (O(n^2)), and this has to be redone every time a job is selected, in order to update the marks, making it O(n^3).

Is there a better solution?

like image 599
kwiqsilver Avatar asked Jan 31 '12 09:01

kwiqsilver


2 Answers

Let A be the set of jobs which we haven't overlapped yet.

  1. Find the job x in A which has the minimal end time (t).
  2. From all jobs whose start time is less than t: pick the job j with the maximum end time.
  3. Add j to the output set.
  4. Remove all jobs which overlap j from A.
  5. Repeat 1-4 until A is empty.

A simple implementation will run in O(n^2). Using interval trees it's probably possible to solve in O(n*logn).

The basic idea behind why it's an optimal solution (not a formal proof): We have to pick one job whose start time is less than t, so that x will be overlapped. If we let S be the set of all jobs whose start time is less than t, it can be shown that j will overlap the same jobs as any job in S, plus possibly more. Since we have to pick one job in S, the best choice is j. We can use this idea to form a proof by induction on the number of jobs.

like image 191
interjay Avatar answered Dec 22 '22 01:12

interjay


We can achieve an O(nlogn) solution with a dynamic programming approach. In particular, we want to consider the size of the smallest set including the kth job and matching the first k jobs (ordered by start time), which we denote by S(k). We should first add an auxiliary job (∞,∞), so the result will be our DP solution for this final job minus one.

To compute S(k), consider the job p(k) which ends before job k, but has maximal start time. Note that p is an increasing function. S(k) will then be one more than the minimum S(i) with end(i) > start(p(k)).

We can efficiently find this job by maintaining a (S(k) ordered min) heap of potential jobs. After computing each S(k), we add the job to the heap. When we want to get a job, we remove jobs at the base of the heap which end too early, until we find a suitable one. This will take a total of at most O(nlogn), since we do at most O(n) of each heap operation (pop/peek/push).

The remainder of the task is to compute the p(k) values efficiently. One way to do this is to iterate over all job start and ends (in increasing time), keeping track of the latest starting job.

like image 40
Nabb Avatar answered Dec 22 '22 01:12

Nabb