I have a list of dates organized like this:
(From, To)
(From, To)
...
(From, To)
I am trying to find how to consolidate ranges in an efficient way (it has to be quite fast because it is to consolidate financial data streams in realtime).
Dates do NOT overlap.
what I was thinking about is:
Sort everything by From time and then iterate through pairs to see if Pair1.To == Pair2.From to merge them, but this means several iterations.
Is there a better way to do this, like in a single pass
Here are some examples
(2019-1-10, 2019-1-12)
(2019-3-10, 2019-3-14)
(2019-1-12, 2019-1-13)
expected output:
(2019-1-10, 2019-1-12) + (2019-1-12, 2019-1-13) -> (2019-1-10, 2019-1-13)
(2019-3-10, 2019-3-14) -> (2019-3-10, 2019-3-14)
In practice, it's really about seconds and not dates, but the idea is the same.
You mention that dates never overlap but I think it is slightly simpler to write code that just merges overlapping dates. First step is to define the date range type:
class Interval
{
public DateTime From { get; set; }
public DateTime To { get; set; }
}
You can then define an extension method that checks if two intervals overlap:
static class IntervalExtensions
{
public static bool Overlaps(this Interval interval1, Interval interval2)
=> interval1.From <= interval2.From
? interval1.To >= interval2.From : interval2.To >= interval1.From;
}
Notice that this code assumes that From <= To
so you might want to change Interval
into an immutable type and verify this in the constructor.
You also need a way to merge two intervals:
public static Interval MergeWith(this Interval interval1, Interval interval2)
=> new Interval
{
From = new DateTime(Math.Min(interval1.From.Ticks, interval2.From.Ticks)),
To = new DateTime(Math.Max(interval1.To.Ticks, interval2.To.Ticks))
};
Next step is define another extension method that iterates a sequence of intervals and tries to merge consecutive overlapping intervals. This is best done using an iterator block:
public static IEnumerable<Interval> MergeOverlapping(this IEnumerable<Interval> source)
{
using (var enumerator = source.GetEnumerator())
{
if (!enumerator.MoveNext())
yield break;
var previousInterval = enumerator.Current;
while (enumerator.MoveNext())
{
var nextInterval = enumerator.Current;
if (!previousInterval.Overlaps(nextInterval))
{
yield return previousInterval;
previousInterval = nextInterval;
}
else
{
previousInterval = previousInterval.MergeWith(nextInterval);
}
}
yield return previousInterval;
}
}
If two consecutive intervals don't overlap it yields the previous interval. However, if they overlap it instead updates the previous interval by merging the two intervals and keep the merged interval as the previous interval for the next iteration.
Your sample data is not sorted so before merging the intervals you have to sort them:
var mergedIntervals = intervals.OrderBy(interval => interval.From).MergeOverlapping();
However, if the real data is sorted which you have indicated in a comment you can skip the sorting. The algorithm will do a single pass over the data and thus is O(n)
.
Give this a go:
var source = new[]
{
new { from = new DateTime(2019, 1, 10), to = new DateTime(2019, 1, 12) },
new { from = new DateTime(2019, 3, 10), to = new DateTime(2019, 3, 14) },
new { from = new DateTime(2019, 1, 12), to = new DateTime(2019, 1, 13) },
};
var data =
source
.OrderBy(x => x.from)
.ThenBy(x => x.to)
.ToArray();
var results =
data
.Skip(1)
.Aggregate(
data.Take(1).ToList(),
(a, x) =>
{
if (a.Last().to >= x.from)
{
a[a.Count - 1] = new { from = a.Last().from, to = x.to };
}
else
{
a.Add(x);
}
return a;
});
It's a nice query and it gives the output that you want.
Create two Dictionaries (i.e. hash maps), one using the To date as the key and the From-To date as the value, the other with the From date as the key.
Iterate over your date ranges and for each range check if the From date exists as a key in the To-date-keyed Dictionary, and vice versa.
If not a match in either then add the range to both the Dictionaries.
If there is a match in one but not the other then remove the matching range from both Dictionaries (using the appropriate key), merge the new range with the existing range and add the result to both.
If there is a match in both Dictionaries (the range being added fills a hole) then remove both matches from both Dictionaries, merge the three ranges (two existing and one new) and add the result to both Dictionaries.
At the end your Dictionaries contain an unsorted set of all date ranges, which you can extract by iterating over the keys of one of the Dictionaries.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With