A friend gave me a puzzle that he says can be solved in better than O(n^3) time.
Given a set of n jobs that each have a set start time and end time (overlaps are very possible), find the smallest subset that for every job either includes that job or includes a job that has overlap with that job.
I'm pretty sure that the optimal solution is to pick the job with the most unmarked overlap, add it to the solution set, then mark it, and its overlap. And repeat until all jobs are marked.
Figuring out which job has the most unmarked overlappers is a simple adjacency matrix (O(n^2)), and this has to be redone every time a job is selected, in order to update the marks, making it O(n^3).
Is there a better solution?
Let A
be the set of jobs which we haven't overlapped yet.
x
in A
which has the minimal end time (t
).t
: pick the job j
with the maximum end time.j
to the output set.j
from A
.A
is empty.A simple implementation will run in O(n^2). Using interval trees it's probably possible to solve in O(n*logn).
The basic idea behind why it's an optimal solution (not a formal proof): We have to pick one job whose start time is less than t
, so that x
will be overlapped. If we let S
be the set of all jobs whose start time is less than t
, it can be shown that j
will overlap the same jobs as any job in S
, plus possibly more. Since we have to pick one job in S
, the best choice is j
. We can use this idea to form a proof by induction on the number of jobs.
We can achieve an O(nlogn) solution with a dynamic programming approach. In particular, we want to consider the size of the smallest set including the kth job and matching the first k
jobs (ordered by start time), which we denote by S(k)
. We should first add an auxiliary job (∞,∞), so the result will be our DP solution for this final job minus one.
To compute S(k)
, consider the job p(k)
which ends before job k
, but has maximal start time. Note that p
is an increasing function. S(k)
will then be one more than the minimum S(i)
with end(i) > start(p(k))
.
We can efficiently find this job by maintaining a (S(k)
ordered min) heap of potential jobs. After computing each S(k)
, we add the job to the heap. When we want to get a job, we remove jobs at the base of the heap which end too early, until we find a suitable one. This will take a total of at most O(nlogn), since we do at most O(n) of each heap operation (pop/peek/push).
The remainder of the task is to compute the p(k)
values efficiently. One way to do this is to iterate over all job start and ends (in increasing time), keeping track of the latest starting job.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With