Consider the following interface that describes a continuous range of integer
values.
public interface IRange {
int Minimum { get;}
int Maximum { get;}
IRange LargestOverlapRange(IEnumerable<IRange> ranges);
}
I am looking for an efficient algorithm to find the largest overlap range given a list of IRange
objects. The idea is briefly outlined in the following diagram. Where the top numbers represent the integer
values, and the |-----|
represent the IRange
objects with a min and max value. I stacked the IRange
objects so that the solution is easy to visualize.
0123456789 ... N
|-------| |------------| |-----|
|---------| |---|
|---| |------------|
|--------| |---------------|
|----------|
Here, the LargestOverlapRange
method would return:
|---|
Since that range has a total of 4 'overlaps'. If there are two separate IRange
with the same number of overlaps, I want to return null
.
Here is some brief code of what I tried.
public class Range : IRange
{
public IRange LargestOverlapRange(IEnumerable<IRange> ranges) {
int maxInt = 20000;
// Create a histogram of the counts
int[] histogram = new int[maxInt];
foreach(IRange range in ranges) {
for(int i=range.Minimum; i <= range.Maximum; i++) {
histogram[i]++;
}
}
// Find the mode of the histogram
int mode = 0;
int bin = 0;
for(int i =0; i < maxInt; i++) {
if(histogram[i] > mode) {
mode = histogram[i];
bin = i;
}
}
// Construct a new range of the mode values, if they are continuous
Range range;
for(int i = bin; i < maxInt; i++) {
if(histogram[i] == mode) {
if(range != null)
return null; // violates two ranges with the same mode
range = new Range();
range.Minimum = i;
while(i < maxInt && histrogram[i] == mode)
i++;
range.Maximum = i;
}
}
return range;
}
}
This involves four loops and is easily O(n^2) if not higher. Is there a more efficient algorithm (speed wise) to find the largest overlap range from a list of other ranges?
EDIT
Yes, the O(n^2) is not correct, I was thinking about it incorrectly. It should be O(N * M) as was pointed out in the comments.
EDIT 2
Let me stipulate a few things, the absolute min and max values of the integer
values will be from (0, 20000). Secondly, the average number of IRange
will be on the order of 100. I don't know if this will change the way the algorithm is designed.
EDIT 3
I am implementing this algorithm on a scientific instrument (a mass spectrometer) in which the speed of the data processing is paramount to the quality of data (faster analysis time = more spectra collected in time T). The firmware language (proprietary) only has arrays[] and is not object orientated. I choose C# since I am decent at porting concepts between the two languages and thought that in the interest of the SO community, a good answer would have a wider audience.
Convert your list of ranges to a list of start and stop points. Sort the list with an O(n log n) algorithm. Now you can iterate through the list and increment or decrement a counter depending on whether it's a start or stop point, which will give you the current overlap depth.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With