Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Subtract Overlaps Between Two Ranges Without Sets

NO SETS!

I can't use Sets because:

  • The ranges will be too long.
  • They will take up too much memory
  • The creation of the sets themselves will take too long.

Using only the endpoints of the of the ranges, is there an optimal way to subtract two lists of ranges?

Example:

r1 = (1, 1000), (1100, 1200)  
r2 = (30, 50), (60, 200), (1150, 1300)

r1 - r2 = (1, 29), (51, 59), (201, 1000), (1100, 1149)

Other info:

  • r2 does not have to overlap r1
  • r1 and r2 will not have pairs that overlap other pairs. For instance, r1 will not have both (0,30) and (10, 25)

Thanks.

like image 691
sequenceGeek Avatar asked Jun 24 '11 01:06

sequenceGeek


1 Answers

This was an interesting problem!

I think this is right, and it's fairly compact. It should work with overlapping ranges of all kinds, but it assumes well-formed ranges (i.e. [x, y) where x < y). It uses [x, y) style ranges for simplicity. It's based on the observation that there are really only six possible arrangements (with results in ()):

Edit: I found a more compact representation:

(s1 e1)  s2 e2
(s1 s2)  e1 e2
(s1 s2) (e2 e1)

 s2 e2  (s1 e1)
 s2 s1  (e2 e1)
 s2 s1   e1 e2 ()

Given a sorted list of endpoints, if endpoints[0] == s1 then the first two endpoints should be in the result. If endpoints[3] == e1 then the last two endpoints should be in the result. If neither, then there should be no result.

I haven't tested it a great deal, so it's entirely possible that something is wrong. Please let me know if you find a mistake!

import itertools

def range_diff(r1, r2):
    s1, e1 = r1
    s2, e2 = r2
    endpoints = sorted((s1, s2, e1, e2))
    result = []
    if endpoints[0] == s1 and endpoints[1] != s1:
        result.append((endpoints[0], endpoints[1]))
    if endpoints[3] == e1 and endpoints[2] != e1:
        result.append((endpoints[2], endpoints[3]))
    return result

def multirange_diff(r1_list, r2_list):
    for r2 in r2_list:
        r1_list = list(itertools.chain(*[range_diff(r1, r2) for r1 in r1_list]))
    return r1_list

Tested:

>>> r1_list = [(1, 1001), (1100, 1201)]
>>> r2_list = [(30, 51), (60, 201), (1150, 1301)]
>>> print multirange_diff(r1_list, r2_list)
[(1, 30), (51, 60), (201, 1001), (1100, 1150)]
like image 191
senderle Avatar answered Nov 16 '22 00:11

senderle