Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Given a set of ranges S, and an overlapping range R, find the smallest subset in S that encompases R

The following is a practice interview question that was given to me by someone, and I'm not sure what the best solution to this is:

Given a set of ranges:
(e.g. S = {(1, 4), (30, 40), (20, 91) ,(8, 10), (6, 7), (3, 9), (9, 12), (11, 14)}. And given a target range R (e.g. R = (3, 13) - meaning the range going from 3 to 13). Write an algorithm to find the smallest set of ranges that covers your target range. All of the ranges in the set must overlap in order to be considered as spanning the entire target range. (In this example, the answer would be {(3, 9), (9, 12), (11, 14)}.

What is the best way to solve this? I was thinking this would be done using a greedy algorithm. In our example above, we would look for all of the numbers that intersect with 3, and pick from those the one with the highest max. Then we would do the same thing with the one we just picked. So, since we picked (3, 9) we now want to find all of the ranges that intersect 9, and among those, we pick the one with the highest max. In that iteration, we picked (9, 12). We do the same thing to that one, and we find that the next range that intersects 12, with the highest max is (11, 14).

After that iteration, we see that 14 is greater than 13 (the max of our range), so we can stop.

The problem I'm having with this algorithm is, how do efficiently query the intersecting ranges? If we try a linear search, we end up with an algorithm that is O(n^2). My next thought was to cross off any of our intersecting ranges from our list each time we run through the loop. So in the first iteration, we cross of (1, 4) and (3, 9). In our next iteration we cross of (9, 12), (3, 9), and (8, 10). So by the last iteration, all we have to look through is {(30, 40), (20, 91), (6, 7)}. We could make this even more efficient by also crossing out everything that has a min > 13, and a max < 3. The problem is this still might not be enough. There is still the potential problem of having lots of duplicate sequences within the bounds of our range. If our list of ranges contained something like {(6, 7), (6, 7), (6, 7), (6, 7), (6, 7)} we would have to look through those each time, even though they aren't useful to us. Even if we were only to store unique values (by putting them all in a set), we might have a really big range, with a bunch of ranges that are inside of our target range, but we also have one range inside that spans almost the entire target range.

What would be an efficient way to query our ranges? Or possibly, what would be a more efficient algorithm to solving this problem?

like image 921
Ephraim Avatar asked Nov 02 '15 02:11

Ephraim


2 Answers

How about using an interval tree for queries? (https://en.m.wikipedia.org/wiki/Interval_tree) I'm not sure if greedy could work here or not. If we look at the last set of choices, overlapping with the high point in R, there's a possibility of overlap between the earlier choices for each one of those, for example:

R = (2,10) and we have (8,10) and (7,10) both overlapping with (6,8)

In that case, we only need to store one value for (6,8) as a second leg of the path; and visiting (6,8) again as we make longer paths towards the low point in R would be superfluous since we already know (6,8) was visited with a lower leg count. So your idea of eliminating intervals as we go makes sense. Could something like this work?

leg = 1
start with the possible end (or beginning) intervals
label these intervals with leg
until end of path is reached:
  remove the intervals labeled leg from the tree
  for each of those intervals labeled leg:
    list overlapping intervals in the chosen direction
  leg = leg + 1
  label the listed overlapping intervals with leg
like image 179
גלעד ברקן Avatar answered Nov 15 '22 07:11

גלעד ברקן


I can suggest following algorithm with complexity O(n log n) without using Intervals trees.

Let introduce some notation. We should cover a range (X,Y) by intervals (x_i,y_i).

First sort given intervals (x_i,y_i) by start point. It will take O(n log n)

Let select from intervals (x_i,y_i) with x_i <= X interval (x_k,y_k) with maximum of y_i. Because interval already sorted by start point, we can just increment index, while interval satisfies condition. If y_k less than X, there are no solution for given set and range. In other case interval (x_k,y_k) contains 'X' and has maximal end point among intervals containing X.

Now we need to cover an interval (y_k, Y), to satisfy overlapping condition. Because for all intervals containing X has end point less than y_k+1, we can start from last interval from the previous step.

Each interval was used only once in this stage, so the time complexity of this part is O(n) and in total O(n log n).

Following code snippet for solution:

intervals // given intervals from set S
(X, Y) // range to cover
sort intervals 
i = 0 // start index
start = X // start point
result_set // set to store result
while start <= Y && i < len(intervals): 
   next_start =  intervals[i].y
   to_add = intervals[i]
   while intervals[i].x <= start && i < len(intervals):
        if next_start > intervals[i].y:
           next_start = intervals[i].y
           to_add = intervals[i]
        i++
   if(next_start < start):
        print 'No solution'
        exit
   start = next_start
   result_set add to_add
like image 45
Alexander Kuznetsov Avatar answered Nov 15 '22 07:11

Alexander Kuznetsov