Suppose I have two collections of intervals, named A and B. How would I find a difference (a relative complement) in a most time- and memory-efficient way?
Picture for illustration:
Interval endpoints are integers ( ≤ 2128-1 ) and they are always both 2n long and aligned on the m×2n lattice (so you can make a binary tree out of them).
Intervals can overlap in the input but this does not affect the output (the result if flattened would be the same).
The problem is because there are MANY intervals in both collections (up to 100,000,000), so naïve implementations will probably be slow.
The input is read from two files and it is sorted in such a way that smaller sub-intervals (if overlapping) come immediately after their parents in order of size. For example:
[0,7]
[0,3]
[4,7]
[4,5]
[8,15]
...
So far, I've been working on a implementation that generates a binary search tree while doing so aggregates neighbouring intervals ( [0,3],[4,7] => [0,7]
) from both collections, then traverses the second tree and "bumps out" the intervals that are present in both (subdividing the larger intervals in the first tree if necessary).
While this appears to be working for small collections, it requires more and more RAM to hold the tree itself, not to mention the time it needs to complete the insertion and removal from the tree.
I figured that since intervals come pre-sorted, I could use some dynamic algorithm and finish in one pass. I am not sure if this is possible, however.
So, how would I go about solving this problem in a efficient way?
Disclaimer: This is not a homework but a modification/generalization of an actual real-life problem I am facing. I am programming in C++ but I can accept an algorithm in any [imperative] language.
Recall one of the first programming exercises we all had back in school - writing a calculator program. Taking an arithmetic expression from the input line, parsing it and evaluating. Remember keeping track of the parentheses depth? So here we go.
Analogy: interval start points are opening parentheses, end points - closing parentheses. We keep track of the parentheses depth (nesting). The depth of two - intersection of intervals, the depth of one - difference of intervals
Algorithm:
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With