Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to intersect two arrays of ranges?

Let a range be an array of two integers: the start and the end (e.g. [40, 42]).

Having two arrays of ranges (which are sorted), I want to find the optimal way to calculate their intersection (which will result into another array of ranges):

A = [[1, 3], [7, 9], [12, 18]]
B = [[2, 3], [4,5], [6,8], [13, 14], [16, 17]]

Intersection:

[[2, 3], [7, 8], [13, 14], [16, 17]]

What is the optimal algorithm for this?


The naive way would be to check each one with all the other ones, but that's obviously not optimal.

I found a similar question asking for the same thing in VBA: Intersection of two arrays of ranges

like image 779
Ionică Bizău Avatar asked Mar 23 '18 16:03

Ionică Bizău


3 Answers

Since the input arrays are sorted, this should be fairly straightforward to work out. I'm assuming that the ranges in any one input array don't intersect one another (otherwise, "which are sorted" would be ambiguous). Consider one range from each array (defined by "current range" indexes a and b). There are several cases (each case other than "full overlap" has a mirror image where A and B are reversed):

No intersection:

A[a]: |------|
B[b]:          |---|

Because the arrays are sorted, A[a] cannot intersect anything in B, so it can be skipped (increment a).

Partial overlap (B[b] extends beyond A[a]):

A[a]: |-------|
B[b]:      |-------|

In this case, add the intersection to the output and then increment a because A[a] cannot intersect anything else in B.

Containment (possibly with coinciding ends):

A[a]: |------|
B[b]:   |--|

Again add the intersection to the output and this time increment b. Note that a further slight optimization is that if A[a] and B[b] end at the same value, then you can increment b as well, since B[b] also cannot intersect anything else in A. (The case of coinciding ends could have been lumped into the partial overlap case. This case could then have been called "strict containment".)

Full overlap:

A[a]: |------|
B[b]: |------|

Add the intersection to the output and increment both a and b (neither range can intersect anything else in the other array).

Continue iterating the above until either a or b runs off the end of the corresponding array and you're done.

It should be trivial straightforward to translate the above to code.

EDIT: To back up that last sentence (okay, it wasn't trivial), here's my version of the above in code. It's a little tedious because of all the cases, but each branch is quite straightforward.

const A = [[1, 3], [7, 9], [12, 18]];
const B = [[2, 3], [4, 5], [6, 8], [13, 14], [16, 17]];

const merged = [];

var i_a = 0,
    i_b = 0;

while (i_a < A.length && i_b < B.length) {
  const a = A[i_a];
  const b = B[i_b];

  if (a[0] < b[0]) {
    // a leads b
    if (a[1] >= b[1]) {
      // b contained in a
      merged.push([b[0], b[1]]);
      i_b++;
      if (a[1] === b[1]) {
        // a and b end together
        i_a++;
      }
    } else if (a[1] >= b[0]) {
      // overlap
      merged.push([b[0], a[1]]);
      i_a++;
    } else {
      // no overlap
      i_a++;
    }
  } else if (a[0] === b[0]) {
    // a and b start together
    if (a[1] > b[1]) {
      // b contained in a
      merged.push([a[0], b[1]]);
      i_b++;
    } else if (a[1] === b[1]) {
      // full overlap
      merged.push([a[0], a[1]]);
      i_a++;
      i_b++;
    } else /* a[1] < b[1] */ {
      // a contained in b
      merged.push([a[0], a[1]]);
      i_a++;
    }
  } else /* a[0] > b[0] */ {
    // b leads a
    if (b[1] >= a[1]) {
      // containment: a in b
      merged.push([a[0], b[1]]);
      i_a++;
      if (b[1] === a[1]) {
        // a and b end together
        i_b++;
      }
    } else if (b[1] >= a[0]) {
      // overlap
      merged.push([a[0], b[1]]);
      i_b++
    } else {
      // no overlap
      i_b++;
    }
  }
}
console.log(JSON.stringify(merged));

You asked for an optimal algorithm. I believe mine is very close to optimal. It runs in linear time with the number of ranges in the two arrays, since each iteration completes the processing of at least one range (and sometimes two). It requires constant memory plus the memory required to build the result.

I should note that unlike the answer by CertainPerformance (the only other answer posted here at the time I'm writing this) my code works for any kind of numeric range data, not just integers. (You might want to replace === with == in the above if you're mixing numbers and string representations of numbers). The algorithm by CertainPerformance flattens the ranges into arrays of consecutive integers that span the ranges. If that total number of integers is n, then his algorithm runs in O(n2) time and O(n) space. (So, for instance, if one of the ranges were [1, 50000], that would require memory for 50,000 numbers and time proportional to the square of that.)

like image 200
Ted Hopp Avatar answered Oct 18 '22 03:10

Ted Hopp


The idea suggested by @Ted Hopp could be implemented in fewer lines of code as follows:

var A = [[1, 3], [7, 9], [12, 18]];
var B = [[2, 3], [4, 5], [6, 8], [13, 14], [16, 17]];

var result = [];
var ai = 0, alength = A.length, ax, ay;
var bi = 0, blength = B.length, bx, by;
while (ai < alength && bi < blength) {
  ax = A[ai][0];
  ay = A[ai][1];
  bx = B[bi][0];
  by = B[bi][1];
  if (ay < bx) {
    // a ends before b
    ai++;
  } else if (by < ax) {
    // b ends before a
    bi++;
  } else {
    // a overlaps b
    result.push([ax > bx ? ax : bx, ay < by ? ay : by]);
    // the smaller range is considered processed
    if (ay < by) {
      ai++;
    } else {
      bi++;
    }
  }
}
console.log(result);

Below is a comprehensive test with large arrays:

var A = [];
var B = [];
var R = [];
(function(rangeArray1, rangeArray2, bruteForceResult) {
  // create random, non-overlapping, sorted ranges
  var i, n, x, y;
  for (i = 0, n = 0; i < 1000; i++) {
    x = n += Math.floor(Math.random() * 100) + 1;
    y = n += Math.floor(Math.random() * 100);
    rangeArray1.push([x, y]);
  }
  for (i = 0, n = 0; i < 1000; i++) {
    x = n += Math.floor(Math.random() * 100) + 1;
    y = n += Math.floor(Math.random() * 100);
    rangeArray2.push([x, y]);
  }
  // calculate intersections using brute force
  rangeArray1.forEach(function(a) {
    rangeArray2.forEach(function(b) {
      if (b[1] >= a[0] && a[1] >= b[0]) {
        bruteForceResult.push([Math.max(a[0], b[0]), Math.min(a[1], b[1])]);
      }
    });
  });
})(A, B, R);

var result = [];
var ai = 0, alength = A.length, ax, ay;
var bi = 0, blength = B.length, bx, by;
while (ai < alength && bi < blength) {
  ax = A[ai][0];
  ay = A[ai][1];
  bx = B[bi][0];
  by = B[bi][1];
  if (ay < bx) {
    // a ends before b
    ai++;
  } else if (by < ax) {
    // b ends before a
    bi++;
  } else {
    // a overlaps b
    result.push([ax > bx ? ax : bx, ay < by ? ay : by]);
    // the smaller range is considered processed
    if (ay < by) {
      ai++;
    } else {
      bi++;
    }
  }
}
console.log(JSON.stringify(R) === JSON.stringify(result) ? "test passed" : "test failed");
like image 4
Salman A Avatar answered Oct 18 '22 02:10

Salman A


Pretty straightforward, just a decent amount of code to write. Flatten a and b into individual elements instead of ranges, find their intersection, and turn it back into an array of ranges again.

const a = [[1, 3], [7, 9], [12, 18]];
const b = [[2, 3], [4,5], [6,8], [13, 14], [16, 17]];
const rangeToArr = ([start, end]) => Array.from({ length: end - start + 1 }, (_, i) => start + i);
const flat = inputArr => inputArr.reduce((arr, elm) => arr.concat(...elm), []);
const aRange = flat(a.map(rangeToArr));
const bRange = flat(b.map(rangeToArr));
const intersection = aRange.filter(num => bRange.includes(num));
console.log(intersection);


// Have the intersection of elements
// now we have to turn the intersection back into an array of ranges again:
const { partialIntersectionRange, thisRangeStarted, lastNum }
= intersection.reduce(({ partialIntersectionRange, thisRangeStarted, lastNum }, num) => {
  // Initial iteration only: populate with initial values
  if (typeof thisRangeStarted !== 'number') {
    return { partialIntersectionRange, thisRangeStarted: num, lastNum: num };
  }
  // If this element is a continuation of the range from the last element
  // then just increment lastNum:
  if (lastNum + 1 === num) {
    return { partialIntersectionRange, thisRangeStarted, lastNum: num };
  }
  // This element is not a continuation of the previous range
  // so make a range out of [thisRangeStarted, lastNum] and push it to the range array
  // (in case thisRangeStarted === lastNum, only push a single value)
  if (thisRangeStarted !== lastNum) partialIntersectionRange.push([thisRangeStarted, lastNum]);
  else partialIntersectionRange.push([thisRangeStarted]);
  return { partialIntersectionRange, thisRangeStarted: num, lastNum: num };
}, { partialIntersectionRange: [] });
if (thisRangeStarted !== lastNum) partialIntersectionRange.push([thisRangeStarted, lastNum]);
else partialIntersectionRange.push([thisRangeStarted]);

console.log(JSON.stringify(partialIntersectionRange));

The difficulty isn't the intersection logic, but getting it formatted the desired way.

like image 2
CertainPerformance Avatar answered Oct 18 '22 02:10

CertainPerformance