There are many sequences of start-end pairs. How to find all the ranges which are contained in all of the sequences? Start and end are integers and they may be far away, so making bit-fields of the sequences and &
-ing them is not feasible. The ranges (i.e. start-end pairs) on one "row" (i.e. one sequence) do not overlap, if that helps. And there is lower and upper bound for start and end, 32-bits integers would be enough I think (i.e., 0 <= values <= 65535).
Let me draw an example:
|----------| |---------------------| |----------|
|----------------------| |-------|
|---------------------| |--|
The result should be:
|--------| |--|
The example above would read, approximately:
row1 = (100, 200), (300, 600), (800, 900)
row2 = (140, 450), (780, 860)
row3 = (280, 580), (820, 860)
result = (300, 450), (820, 860)
Also, is there any known algorithm for this? I mean, does this problem have a name?
That should not be to hard assuming the ranges in each sequence do not overlap. In this case it is just a matter of iterating over all points and tracking when you enter or leave a range.
Throw all your points from all your sequences into one list, sort it and remember for each point if it is a start or end point.
100 S ---
140 S | ---
200 E --- |
280 S | ---
300 S --- | |
450 E | --- |
580 E | ---
600 E ---
780 S ---
800 S --- |
820 S | | ---
860 E | --- |
860 E | ---
900 E ---
Now you iterate over this list and every time you encounter a start point you increment a counter, every time you encounter an end point you decrement the counter.
0
100 S 1
140 S 2
200 E 1
280 S 2
300 S 3 <--
450 E 2 <--
580 E 1
600 E 0
780 S 1
800 S 2
820 S 3 <--
860 E 2 <--
860 E 1
900 E 0
When the counter is equal to the number of sequences - three in your example - you have found the start of one range and the next point is the end of this range.
Note that it is not even required to build the list explicitly if the ranges in each sequence are sorted by start or can be sorted by start. In this case you can just iterate over all sequences in parallel by keeping a pointers to the current range in each sequence.
Here the whole thing in C# - the class for ranges.
internal sealed class Range
{
private readonly Int32 start = 0;
private readonly Int32 end = 0;
public Range(Int32 start, Int32 end)
{
this.start = start;
this.end = end;
}
internal Int32 Start
{
get { return this.start; }
}
internal Int32 End
{
get { return this.end; }
}
}
The class for points with a flag to discriminate between start and end points.
internal sealed class Point
{
private readonly Int32 position = 0;
private readonly Boolean isStartPoint = false;
public Point(Int32 position, Boolean isStartPoint)
{
this.position = position;
this.isStartPoint = isStartPoint;
}
internal Int32 Position
{
get { return this.position; }
}
internal Boolean IsStartPoint
{
get { return this.isStartPoint; }
}
}
And finally the algorithm and test program.
internal static class Program
{
private static void Main()
{
var s1 = new List<Range> { new Range(100, 200), new Range(300, 600), new Range(800, 900) };
var s2 = new List<Range> { new Range(140, 450), new Range(780, 860) };
var s3 = new List<Range> { new Range(280, 580), new Range(820, 860) };
var sequences = new List<List<Range>> { s1, s2, s3 };
var startPoints = sequences.SelectMany(sequence => sequence)
.Select(range => new Point(range.Start, true));
var endPoints = sequences.SelectMany(sequence => sequence)
.Select(range => new Point(range.End, false));
var points = startPoints.Concat(endPoints).OrderBy(point => point.Position);
var counter = 0;
foreach (var point in points)
{
if (point.IsStartPoint)
{
counter++;
if (counter == sequences.Count)
{
Console.WriteLine("Start {0}", point.Position);
}
}
else
{
if (counter == sequences.Count)
{
Console.WriteLine("End {0}", point.Position);
Console.WriteLine();
}
counter--;
}
}
Console.ReadLine();
}
}
The output is as desired the following.
Start 300
End 450
Start 820
End 860
I think you can do that rather simply by fusing the sequences 2 by 2.
Each fusion should be doable in linear time of the number of intervals in the considered sequences (if the sequences are sorted), and M-1 fusions are required (with M number of sequences)
Taking your example and adding an extra sequence:
|----------| |---------------------| |----------|
|----------------------| |-------|
|---------------------| |--|
|-----------------------------------| |-----|
Fuse by pair of sequences:
|-----| |--------| |-----|
|---------------------| |--|
Fuse again:
|--------| |--|
But you may be able to find a faster way to do it. The worst case has O(N log M) runtime (N total number of intervals).
Edit: Pseudocode for fusion
Take s1 and s2 an iterator on each sequence
While there are still intervals in both sequences
Compare the intervals:
If s1.begin < s2.begin
If s2.begin < s1.end
If s2.end > s1.end
Add [s2.begin,s1.end] to the fused sequence
Increment s1
Else
Add [s2.begin,s2.end] to the fused sequence
Increment s2
Else
Increment s1
Else
Same thing with s1 and s2 reversed
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With