Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

difference of two sets of intervals

i'm trying to write some code to calculate the difference of two sets of intervals A - B , Interval endpoints are integers , but i'm strugling to come with efficient solution , any suggestion would be much appreciated example : [(1, 4), (7, 9)] - [(3,5)] = [(1, 3), (7, 9)]

well this the best i tried so far ( the two lists are already sorted )

class tp():
   def __repr__(self):
       return '(%.2f,%.2f)' % (self.start, self.end)
   def __init__(self,start,end): 
       self.start=start
       self.end=end



z=[tp(3,5)] #intervals to be subtracted
s=[tp(1, 4)),tp(7, 9), tp(3,4),tp(4,6)]

for x in s[:]:
   if z.end < x.start:
    break
   elif z.start < x.start and z.end > x.start and z.end < x.end:
    x.start=z.end
   elif z.start < x.start and z.end > x.end:
    s.remove(x)
   elif z.start > x.start and z.end < x.end:
    s.append(tp(x.start,z.start))
    s.append(tp(z.end,x.end))
    s.remove(x)
   elif z.start > x.start and z.start < x.end and z.end > x.end:
    x.end=z.start
   elif z.start > x.end:
    continue
like image 556
desprategstreamer Avatar asked Nov 18 '13 23:11

desprategstreamer


People also ask

How do you find the difference between two sets?

The difference of the sets A and B in this order is the set of elements which belong to A but not to B. Symbolically, we write A – B and read as “ A minus B”. The representation of A – B using a Venn diagram is given below. Also, note that A – B is not equal to B – A, i.e. A – B ≠ B – A.

What is the difference between sets and intervals?

Hint: The difference between set and interval is that an interval is a set that consists of all real numbers between a given pair of numbers. An endpoint of an interval is either of the two points that mark the end of the line segment.

What is an interval in sets?

In mathematics, a (real) interval is a set of real numbers that contains all real numbers lying between any two numbers of the set. For example, the set of numbers x satisfying 0 ≤ x ≤ 1 is an interval which contains 0, 1, and all numbers in between.

How do you subtract a set from set B?

The set A−B consists of elements that are in A but not in B. For example if A={1,2,3} and B={3,5}, then A−B={1,2}.


2 Answers

The only way to make the operation efficient is to keep the interval lists sorted and non-overlapping (which can be done in O(n log n)). [See Notes, below].

With both lists sorted and non-overlapping, any set operation (union, intersection, difference, symmetric difference) can be performed with a simple merge.

The merge operation is straightforward: simultaneously loop over the endpoints of both arguments, in order. (Note that the endpoints of each interval list are sorted because we require that the intervals not overlap.) For each endpoint discovered, decide whether it is in the result or not. If the result currently has an odd number of endpoints and the new endpoint is not in the result, add it to the result; similarly, if the result currently has an even number of endpoints and the new endpoint is in the result, add it to the result. At the end of this operation, the result is a list of endpoints, alternating between interval start and interval end.

Here it is in python:

# If using python 3, uncomment the following:
# from functools import reduce

# In all of the following, the list of intervals must be sorted and 
# non-overlapping. We also assume that the intervals are half-open, so
# that x is in tp(start, end) iff start <= x and x < end.

def flatten(list_of_tps):
  """Convert a list of intervals to a list of endpoints"""
  return reduce(lambda ls, ival: ls + [ival.start, ival.end],
                list_of_tps,
                [])

def unflatten(list_of_endpoints):
  """Convert a list of endpoints, with an optional terminating sentinel,
     into a list of intervals"""
  return [tp(list_of_endpoints[i], list_of_endpoints[i + 1])
          for i in range(0, len(list_of_endpoints) - 1, 2)]

def merge(a_tps, b_tps, op):
  """Merge two lists of intervals according to the boolean function op"""
  a_endpoints = flatten(a_tps)
  b_endpoints = flatten(b_tps)

  sentinel = max(a_endpoints[-1], b_endpoints[-1]) + 1
  a_endpoints += [sentinel]
  b_endpoints += [sentinel]

  a_index = 0
  b_index = 0

  res = []

  scan = min(a_endpoints[0], b_endpoints[0])
  while scan < sentinel:
    in_a = not ((scan < a_endpoints[a_index]) ^ (a_index % 2))
    in_b = not ((scan < b_endpoints[b_index]) ^ (b_index % 2))
    in_res = op(in_a, in_b)

    if in_res ^ (len(res) % 2): res += [scan]
    if scan == a_endpoints[a_index]: a_index += 1
    if scan == b_endpoints[b_index]: b_index += 1
    scan = min(a_endpoints[a_index], b_endpoints[b_index])

  return unflatten(res)

def interval_diff(a, b):
  return merge(a, b, lambda in_a, in_b: in_a and not in_b)

def interval_union(a, b):
  return merge(a, b, lambda in_a, in_b: in_a or in_b)

def interval_intersect(a, b):
  return merge(a, b, lambda in_a, in_b: in_a and in_b)

Notes

  1. The intervals [a, b) and [b, c) are not overlapping since they are disjoint; b belongs only to the second one. But the union of these two intervals should still be [a,c). But for the purposes of the functions in this answer, we should also require intervals to not be adjacent. The extend non-overlapping to include the case where the intervals are adjacent; otherwise, we risk finding the adjacency point needlessly included in the output. (That's not strictly speaking wrong, but it's easier to test functions if there output is deterministic.)

    Here's a sample implementation of a function which normalises an arbitrary list of intervals into a sorted, non-overlapping interval.

    def interval_normalise(a):
        rv = sorted(a, key = lambda x: x.start)
        out = 0
        for scan in range(1, len(rv)):
            if rv[scan].start > rv[out].end:
                if rv[out].end > rv[out].start: out += 1
                rv[out] = rv[scan]
            elif rv[scan].end > rv[out].end:
                rv[out] = tp(rv[out].start, rv[scan].end)
        if rv and rv[out].end > rv[out].start: out += 1
        return rv[:out]
    
like image 65
rici Avatar answered Sep 22 '22 15:09

rici


This can be solved with a sweep line algorithm. The idea is to keep all the starting points of intervals from both sets in one sorted array and end points in other sorted array marking them with information that they belong to which set. e.g.

       A              B
[(1, 4), (7, 9)] - [(3,5)]
A: start:[1,7] end:[4,9], B: start:[3]end:[5]
start:[(1,a),(3,b),(7,a)]
end: [(4,a),(5,b),(9,a)]

Now have two pointer one to beginning of each array.In a loop increment one which points to lowest value adding intervals which start with a till they end with b or a. e.g. for above we will iterate points in this order

(1,a) (3,b) (4,a) (5,b) (7,a) (9,a)
# and adding intervals where we have seen an start a and an end a or b
(1,3) (7,9)

This leads to an linear solution in terms of number of intervals.

like image 44
FUD Avatar answered Sep 26 '22 15:09

FUD