The problem I am trying to solve has a list of intervals on the number line, each with a pre-defined score. I need to return the maximum possible total score.
The catch is that the intervals overlap, and of the overlapping intervals I can use only one. Here is an example.
Intervals - Score 0- 5 - 15 4- 9 - 18 10-15 - 12 8-21 - 19 25-30 - 25
Here, the intervals 0-5, 4-9 and 8-21 overlap.
The intervals 10-15 and 8-21 also overlap.
The maximum sum would be 55 (18+12+25).
It is important to note here that we select the interval 4-9 of the first batch of overlapping intervals even though it does not have the highest score of the three.
This is because selecting the interval 8-21 would prevent us from using the interval 10-15 later on, thereby reducing the overall sum (In that case, the overall sum would be 19+25=44).
I am looking for an O(nlogn) or O(n) solution to this problem. I think dynamic programming can be used, but I may be wrong. Could someone suggest a solution/algorithm(s) that could do the trick here?
Edit: The intervals are in no particular order.
This is a weighted variation of interval scheduling; it's solvable in O(N log N)
with dynamic programming.
Let an interval be g(start, stop, score)
, and let them be sorted by stop
. For simplicity, let's assume for now that all stop
is unique.
Let best[i]
be the best score we can get when we're allowed to use g[1], ..., g[i]
. We don't have to use them all, of course, and generally we can't because the subset of intervals we use must be non-overlapping.
best[0] = 0
. That is, since we can't use any interval, the best score we can get is 0.1 <= k <= N
, we have: best[k] = max( best[k-1], best[j] + g[k].score )
, where j
is the largest index such that g[j].stop < g[k].start
(j
may be zero)That is, given that we're allowed to use g[1], ... g[k]
, the best we can do is the better scoring of these two options:
g[k]
. Thus, the score of this option is best[k-1]
. g[1], ... g[k-1]
g[k]
, and to its left we do the best we can with all the genes that don't overlap with g[k]
, i.e. all g[1], ..., g[j]
, where g[j].stop < g[k].start
and j
is as large as possible. Thus, the score of this option is best[j] + g[k].score
.(Note the optimal substructure and overlapping subproblems components of dynamic programming embodied in the above equation).
The overall answer to the question is best[N]
, i.e. the best score we can get when we're allowed to use all the genes. Oops, did I say genes? I mean intervals.
This is O(N log N)
because:
O(N log N)
j
for each k
is O(log N)
using binary searchIf several genes can have the same stop
values, then nothing changed: you still have to search for the rightmost j
. In e.g. Python this is easy with bisect_right
. In Java where the standard library binary search doesn't guarantee which index is returned in case of ties, you can (among many options) follow it with a linear search (for O(N)
worst-case performance), or another series of binary searches to find the right most index.
Oops did I say genes again? I mean intervals.
First of all, I think the maximum is 59, not 55. If you choose intervals [0-5],[8-21], and [25,30], you get 15+19+25=59. You can use some sort of dynamic programming to handle this.
First, you sort all the intervals by their starting point, then iterate from end to start. For each item in list, you choose the maximum sum from that point to the last as max(S[i]+S[j], S[i+1])
, where i is the item you are on, j is the item that is the first non-overlapping entry following your item (that is, the first item whose start is larger than the current item's end). To speed up the algorithm, you want to store the maximum partial sum S[j] for each element.
To clarify, let me solve your example according to this. First, sort your intervals:
1: 0- 5 - 15 2: 4- 9 - 18 3: 8-21 - 19 4: 10-15 - 12 5: 25-30 - 25
So,
S[5] = 25 S[4] = max(12+S[5], 25)=37 S[3] = max(19+S[5], S[4])=max(19+25,37)=44 S[2] = max(18+S[4], S[3])=max(18+37,44)=55 S[1] = max(15+S[3], S[2])=max(15+44, 55)=59
This is an adaptation of the algorithm in this post, but unfortunately, doesn't have the nice O(n) running time. A degenerate list where each entry overlaps the next would cause it to be O(n^2).
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With