Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Overlapping sequences

There are many sequences of start-end pairs. How to find all the ranges which are contained in all of the sequences? Start and end are integers and they may be far away, so making bit-fields of the sequences and &-ing them is not feasible. The ranges (i.e. start-end pairs) on one "row" (i.e. one sequence) do not overlap, if that helps. And there is lower and upper bound for start and end, 32-bits integers would be enough I think (i.e., 0 <= values <= 65535).

Let me draw an example:

|----------|       |---------------------|           |----------|
     |----------------------|                      |-------|
                |---------------------|                 |--|

The result should be:

                   |--------|                           |--|

The example above would read, approximately:

row1 = (100, 200), (300, 600), (800, 900)
row2 = (140, 450), (780, 860)
row3 = (280, 580), (820, 860)
result = (300, 450), (820, 860)

Also, is there any known algorithm for this? I mean, does this problem have a name?

like image 339
Ecir Hana Avatar asked Jan 10 '13 17:01

Ecir Hana


2 Answers

That should not be to hard assuming the ranges in each sequence do not overlap. In this case it is just a matter of iterating over all points and tracking when you enter or leave a range.

Throw all your points from all your sequences into one list, sort it and remember for each point if it is a start or end point.

100 S ---
140 S  |   ---
200 E ---   |
280 S       |  ---
300 S ---   |   |
450 E  |   ---  |
580 E  |       ---
600 E ---
780 S      ---
800 S ---   |
820 S  |    |  ---
860 E  |   ---  |
860 E  |       ---
900 E ---

Now you iterate over this list and every time you encounter a start point you increment a counter, every time you encounter an end point you decrement the counter.

      0
100 S 1
140 S 2
200 E 1
280 S 2  
300 S 3 <--
450 E 2 <--
580 E 1
600 E 0
780 S 1
800 S 2
820 S 3 <--
860 E 2 <--
860 E 1
900 E 0

When the counter is equal to the number of sequences - three in your example - you have found the start of one range and the next point is the end of this range.

Note that it is not even required to build the list explicitly if the ranges in each sequence are sorted by start or can be sorted by start. In this case you can just iterate over all sequences in parallel by keeping a pointers to the current range in each sequence.

Here the whole thing in C# - the class for ranges.

internal sealed class Range
{
    private readonly Int32 start = 0;

    private readonly Int32 end = 0;

    public Range(Int32 start, Int32 end)
    {
        this.start = start;
        this.end = end;
    }

    internal Int32 Start
    {
        get { return this.start; }
    }

    internal Int32 End
    {
        get { return this.end; }
    }
}

The class for points with a flag to discriminate between start and end points.

internal sealed class Point
{
    private readonly Int32 position = 0;

    private readonly Boolean isStartPoint = false;

    public Point(Int32 position, Boolean isStartPoint)
    {
        this.position = position;
        this.isStartPoint = isStartPoint;
    }

    internal Int32 Position
    {
        get { return this.position; }
    }

    internal Boolean IsStartPoint
    {
        get { return this.isStartPoint; }
    }
}

And finally the algorithm and test program.

internal static class Program
{
    private static void Main()
    {
        var s1 = new List<Range> { new Range(100, 200), new Range(300, 600), new Range(800, 900) };
        var s2 = new List<Range> { new Range(140, 450), new Range(780, 860) };
        var s3 = new List<Range> { new Range(280, 580), new Range(820, 860) };

        var sequences = new List<List<Range>> { s1, s2, s3 };

        var startPoints = sequences.SelectMany(sequence => sequence)
                                   .Select(range => new Point(range.Start, true));

        var endPoints   = sequences.SelectMany(sequence => sequence)
                                   .Select(range =>  new Point(range.End, false));

        var points = startPoints.Concat(endPoints).OrderBy(point => point.Position);

        var counter = 0;

        foreach (var point in points)
        {
            if (point.IsStartPoint)
            {
                counter++;

                if (counter == sequences.Count)
                {
                    Console.WriteLine("Start {0}", point.Position);
                }
            }
            else
            {
                if (counter == sequences.Count)
                {
                    Console.WriteLine("End   {0}", point.Position);
                    Console.WriteLine();
                }

                counter--;
            }
        }

        Console.ReadLine();
    }
}

The output is as desired the following.

Start 300
End   450

Start 820
End   860
like image 98
Daniel Brückner Avatar answered Sep 20 '22 16:09

Daniel Brückner


I think you can do that rather simply by fusing the sequences 2 by 2.

Each fusion should be doable in linear time of the number of intervals in the considered sequences (if the sequences are sorted), and M-1 fusions are required (with M number of sequences)

Taking your example and adding an extra sequence:

|----------|       |---------------------|           |----------|
     |----------------------|                      |-------|
                |---------------------|                 |--|
        |-----------------------------------|           |-----|  

Fuse by pair of sequences:

     |-----|       |--------|                        |-----|
                |---------------------|                 |--|

Fuse again:

                   |--------|                           |--|

But you may be able to find a faster way to do it. The worst case has O(N log M) runtime (N total number of intervals).

Edit: Pseudocode for fusion

Take s1 and s2 an iterator on each sequence
While there are still intervals in both sequences
    Compare the intervals:
    If s1.begin < s2.begin
        If s2.begin < s1.end
            If s2.end > s1.end
                Add [s2.begin,s1.end] to the fused sequence
                Increment s1
            Else
                Add [s2.begin,s2.end] to the fused sequence
                Increment s2
        Else
            Increment s1
    Else
        Same thing with s1 and s2 reversed
like image 32
Khaur Avatar answered Sep 20 '22 16:09

Khaur