Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Find the maximally intersecting subset of ranges

If you have a set of ranges, such as the following simple example...

[
    [12, 25], #1
    [14, 27], #2
    [15, 22], #3
    [17, 21], #4
    [20, 65], #5
    [62, 70], #6
    [64, 80]  #7
]

... how do you compute the maximally intersecting subset (not sure quite how to phrase it, but I mean "the subset of ranges which intersects and has the highest cardinality") and determine the degree of intersection (cardinality of ranges in that subset)?

Logically I can work it out, and might be able to translate that to a naive algorithm. Going down the list, we see that 1-5 intersect, and 5-7 intersect, and that #5 intersects both sets.

The result I want is simply the subset, since that gives me the information about the cardinality, and I can easily compute the intersection of the set as long as they all intersect. In the above example, it would be [[14, 27],[15, 22],[12, 25],[17, 21],[20, 65]].

Off the top of my head, I might try converting each range to a graph node, connecting the ones which are intersecting, and finding the largest fully-connected graph.

I was also thinking iteratively to start at the beginning, continue building up a list of intersecting ranges with a running intersection on each to check against—until you hit an element which doesn't intersect, then start a new list. Continue checking each item against the existing intersections. However I'm not sure this is complete.

I could take a stab at implementing something (lang is ruby FWIW), but I would love to hear how others might solve this problem, and what the most efficient and elegant way might be.

Update:

I believe this is a specific case of the Maximum clique problem, which is NP-hard and thus actually difficult. Suggestions for approximations/real-world use would be most appreciated!

See also: http://en.wikipedia.org/wiki/Maximum_clique / Find all complete sub-graphs within a graph

Update 2

Found a nice proof of this problem's NP-hardness and NP-completeness here: http://www.cs.bris.ac.uk/~popa/ipl.pdf

Looks like this is the end of the line then. Sorry folks! I'll work with a good-enough greedy approximation. Thanks.

As said in the answers I don't think that paper describes this problem... we probably have more information here based on the ranges.

like image 769
trisweb Avatar asked Feb 21 '13 22:02

trisweb


People also ask

How do you find the maximum number of intersections?

Naive Approach: The simplest approach to solve the given problem is to iterate over all segments and for each segment count the number of intersections by checking it with all other segments and then print the maximum among all the count of intersections obtained.

How do you find the intersection of a range?

An intersection is an interval that lies within all of the given intervals. If no such intersection exists then print -1. [5, 6] is the common interval that lies in all the given intervals. No intersection exists between the two given ranges.

How do you find the intersection of two intervals in C++?

just iterate over the range [l, r] to get the intersection values.


2 Answers

If I understand the problem correctly, it is not an instance of the NP problem described in the paper you linked to. Here is my understanding of the problem, and a polynomial-time solution.

  1. We are given a finite set of ranges of real numbers, say n: [A1, B1], [A2, B2], ..., [An, Bn], where Ai ≤ Bi.

  2. Create a sorted list of the starting and ending points, ordered numerically, indicating whether the point is a starting or ending point.

In your example, this would be: 12+, 14+, 15+, 17+, 20+, 21-, 22-, 25-, 27-, 62+, 64+, 65-, 70-, 80-

  1. Initialize curOverlap and maxOverlap to zero.

  2. Iterate through the list, incrementing curOverlap for each + and decrementing it for each -. Set maxOverlap = max(curOverlap, maxOverlap) on each increment.

To continue your example:
val, cur, max
12, 1, 1
14, 2, 2
15, 3, 3
17, 4, 4
20, 5, 5
21, 4, 5
22, 3, 5
25, 2, 5
27, 1, 5
62, 2, 5
64, 3, 5
65, 2, 5
70, 1, 5
80, 0, 5

The max overlap is 5. You could also store the val associated with the max if you wanted to know where the max overlap occurred. That would give you 20. in this example. It's then trivial to go through the initial set of ranges and find the 5 which include 20.

-edit- If you have repeated values, count the plusses before the minuses for each value so that you include ranges that overlap at a single point.

like image 72
Dave Avatar answered Nov 12 '22 04:11

Dave


Here is a Java implementation of the solution proposed by Dave:

public static List<Range> getMaxIntersectingSubset(List<Range> ranges) {
    // First, we collect "a sorted list of the starting and ending points".
    // We're sorting such that all lower bounds come first
    List<Bound> intersections = Arrays.stream(ranges)
        .flatMap(arr -> Stream.of(Bound.ofLowerBound(arr[0]), Bound.ofUpperBound(arr[1])))
        .sorted(Comparator
            .comparing(Bound::value)
            .thenComparing((a, b) -> Boolean.compare(!a.isLowerBound(), b.isLowerBound())))
        )
        .collect(Collectors.toList());

    long curOverlap = 0;
    long maxOverlap = 0;
    List<Integer> indexes = new ArrayList<>();

    // Then we iterate the list, searching for the highest overlap
    for (int i = 0; i < intersections.size(); i++) {
        Bound intersection = intersections.get(i);
        curOverlap += (intersection.isLowerBound() ? 1 : -1);
        if (curOverlap > maxOverlap) {
            maxOverlap = curOverlap;
            indexes.clear();
            indexes.add(i);
        }
        else if (curOverlap == maxOverlap) {
            indexes.add(i);
        }
    }

    return indexes.stream()
        .map(index -> Range.of(intersections.get(index).value(), intersections.get(index + 1).value()))
        .collect(Collectors.toList());
}
public record Bound(int value, boolean isLowerBound) {

    public static Bound ofLowerBound(int value) {
        return new Bound(value, true);
    }

    public static Bound ofUpperBound(int value) {
        return new Bound(value, false);
    }
}
public record Range(int start, int end) {

    private Range(int start, int end) {
        if (start > end) {
            throw new IllegalArgumentException("start > end");
        }
        this.start = start;
        this.end = end;
    }

    public static Range of(int start, int end) {
        return new Range(start, end);
    }
}

These are some test sets:

  1. From this Stackoverflow question

    int[][] ints = {
        { 1, 100 },
        { 30, 95 },
        { 10, 60 },
        { 15, 25 },
        { 33, 66 },
        { 20, 50 },
        { 51, 100 },
        { 25, 70 }
    };
    

    According to OP of abovementioned question, the result should be [ 30, 55 ]. The provided code outputs this result. However, there's another overlap, although less wide. See pictures below.

    Test case #1a

    Test case #1b

  2. OP of this question mentions this list:

    int[][] ints = {
        { 12, 25 },
        { 14, 27 },
        { 15, 22 },
        { 17, 21 },
        { 20, 65 },
        { 62, 70 },
        { 64, 80 }
    };
    

    Here, the output is [ 20, 21 ]. This matches the below diagram:

    Diagram of test case #2

like image 4
MC Emperor Avatar answered Nov 12 '22 06:11

MC Emperor