Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Sub O(n^2) algorithm for counting nested intervals?

We have a list of intervals of the form [ai, bi]. For each interval, we want to count the number of other intervals that are nested within it.

For example, if we had two intervals, A = [1,4] and B = [2,3]. Then the count for B would be 0 as there are no nested intervals for B; and the count for A would be 1 as B fits within A.

My question is, does there exist a sub- O(n2) algorithm for this problem where n is the number of intervals?

EDIT: Here are the conditions the intervals meet. The end points of the intervals are floating point numbers. The lower limit for the ai's/bi's is 0 and the upper limit is whatever max float is. Also, there is the condition that ai < bi, so no intervals of length 0.

like image 828
mandy Avatar asked Oct 18 '12 03:10

mandy


2 Answers

Yes, it is possible.

We will borrow the typical computational geometry "scan line" trick.

First, let's answer an easier (but closely related) question. Instead of reporting how many other intervals each interval contains, let's report how many intervals each is contained in. So for your example with only two intervals, interval I0 = [1,4] has value zero because it is contained in zero intervals, while I1 = [2,3] has value one because it is contained in one interval.

You will see in a minute (a) why this question is easier and (b) how it leads to the answer for the original question.

To solve this easier question: Take all starting and ending points -- all of the ai and bi -- and put them into a master list. Call each element of this list an "event". So an event would be something like "interval I37 started" or "interval I23 ended".

Sort this list of events and process it in order.

As you process the list of events, maintain a set S of "active intervals". An interval is "active" if we have encountered its start event but not its ending event; that is, if we are within that interval.

Now, whenever we see an ending event bj, we are ready to compute how many intervals contain Ij (= [aj, bj]). All we need to do is examine the set S of active intervals and determine how many of them started before aj. That is our answer for how many intervals contain interval Ij.

To do this efficiently, keep S itself sorted by starting point; e.g., by using a self-balancing binary tree.

Sorting the list of events is O(2n log 2n) = O(n log n). Adding or removing an element from a self-balancing binary tree is O(log n). Asking "how many elements of the self-balancing binary tree are less than x?" is also O(log n). Therefore this entire algorithm is O(n log n).

So, that solves the easy question. Call that the "easy algorithm". Now for what you actually asked.

Think of the number line as extending to infinity and wrapping around to -infinity, and define an interval with bi < ai to start at ai, stretch to infinity, wrap to minus infinity, and end at bi.

For any interval Ij = [aj, bj], define Complement(Ij) as the interval [bj, aj]. (For example, the interval [2, 3] starts at 2 and ends at 3; so Complement([2,3]) = [3,2] starts at 3, stretches to infinity, wraps to -infinity, and ends at 2.)

Observe that interval I contains interval J if and only if Complement(J) contains Complement(I). (Prove this.)

So, we can answer your original question simply by running the "easy algorithm" on the set of complements of all of the intervals. That is, start your scan at -infinity with the set S of "active intervals" containing all intervals (because all complements contain infinity/-infinity). Keep S sorted by end point (i.e. start point of complement).

Sort all start points and end points and process them in order. When you encounter a starting point for interval Ij (= [aj, bj]), you are actually hitting the end point of its complement... So remove Ij from S, query S to see how many of its endpoints (i.e. complement start points) come before bj, and report that as the answer for Ij. If you later encounter the end point of Ij, you are encountering the start point of its complement, so you need to add it back into the set S of active intervals.

This final algorithm is O(n log n) for the same reasons the "easy algorithm" was.

[Update]

One clarification, one correction, one comment...

Clarification: Of course, the "self-balancing binary tree" has to be augmented such that each sub-tree knows how many elements it contains. Otherwise, you cannot answer "how many elements are less than x?" This augmentation is straightforward to maintain, but it is not something that every implementation provides; e.g. the C++ std::set does not, to my knowledge.

Correction: You do not want to add any elements back in to the set S of active intervals; in fact, doing so can result in the wrong answer. For example, if the intervals are just [1,2] and [3,4], you would hit 1 (and remove [1,2] from the set), then 2 (and add it back in again), then 3... And since 2<4, you would conclude that [3,4] contains [1,2]. Which is wrong.

Conceptually, you already processed all of the "start events" for the complement intervals; that is why S begins will all intervals inside of it. So all you need to worry about are the ending points; you do not want to add any elements to S, ever.

Put another way, instead of having the intervals wrap around, you can think of [bi,ai] (where bi > ai) as meaning [bi - infinity, ai] with no wrap-around. The logic still works, but the processing is more clear: First you process all of the "whatever - infinity" terms (i.e. the end points), then you process the others (i.e. the start points).

With this correction, I am pretty sure my solution actually works. This formulation also extends -- I think -- to the case where you have both normal and "backward" intervals together in one input.

Comment: This problem is tricky because if you have to enumerate the set of all intervals contained within every interval, the output itself can be O(n^2). So any working approach has to somehow count the intervals without even being able to identify them :-).

like image 165
Nemo Avatar answered Nov 20 '22 21:11

Nemo


Here is a O(N*LOG(N)):

let Ii = Interval i = (ai, bi)

let L = list of intervals I

sort L by ai

divide L in half into L1a and L2a.

sort L1a and L2a by bi to get L1b and L2b

merge sort L1b and L2b keeping track of the count of nestings (e.g. because all intervals in L1b start before intervals in L2b, when we find an endpoint in L1b that is higher than an endpoint in l2b, we know everything between them is nested inside - think about it)..

Now you have updated the counts on how often an interval in L2 is nested inside an interval in L1.

after merging L1 and L2, we repeat the process (recursion) by dividing L1 into L11a and l12a, also dividing L2 into L21a and L21a..

like image 2
robert king Avatar answered Nov 20 '22 23:11

robert king