Fastest way to split overlapping date ranges

Conditions

Every record with higher ID (newer record) takes precedence over older records that it may overlap (fully or partially)
Ranges are at least 1 day long (RangeFrom and RangeTo differ by one day)

So for a given date range (not longer than ie. 5 years) I have to

get all range records that fall into this range (either fully or partially)
split these overlaps into non-overlapping ranges
return these new non overlapping ranges

My take on it

Since there's a lot of complex data related to these ranges (lots of joins etc etc) and since processor + memory power is much more efficient than SQL DB engine I decided to rather load overlapping data from DB to my data layer and do the range chopping/splitting in memory. This give me much more flexibility as well as speed in terms of development and execution.

If you think this should be better handled in DB let me know.

Question

I would like to write the fastest and if at all possible also resource non-hungry conversion algorithm. Since I get lots of these records and they are related to various users I have to run this algorithm for each user and its set of overlapping ranges data.

What would be the most efficient (fast and non resource hungry) way of splitting these overlapping ranges?

Example data

I have records ID=1 to ID=5 that visually overlap in this manner (dates are actually irrelevant, I can better show these overlaps this way):

       6666666666666
                44444444444444444444444444         5555555555
          2222222222222            333333333333333333333            7777777
11111111111111111111111111111111111111111111111111111111111111111111

Result should look like:

111111166666666666664444444444444444444444333333333555555555511111117777777

Result actually looks like as if we'd be looking at these overlaps from the top and then get IDs that we see from this top-down view.

Result will actually get transformed into new range records, so old IDs become irrelevant. But their RangeFrom and RangeTo values (along with all related data) will be used:

111111122222222222223333333333333333333333444444444555555555566666667777777

This is of course just an example of overlapping ranges. It can be anything from 0 records to X for any given date range. And as we can see range ID=2 got completely overwritten by 4 and 6 so it became completely obsolete.

679

asked Apr 19 '11 06:04

Robert Koritnik

1 Answers

How about an array of nullable integers

I've come up with an idea of my own:

for the given date range I would create an in memory array of integers with as many items as there are days in the range.
fill array with null values. All of them.
order records by ID in reverse order
flatten overlapped ranges by iterating over ordered records and do the following on each item:
1. get item
2. calculate start and end offset for array (days difference)
3. set all array values between these two offsets to item ID but only when value is null
4. continue to step 4.1
you end up with an array of flattened ranges and filled with record IDs
create new set of records and create each new record when ID in array changes. Each record should use data associated with the record ID as set in array
Repeat the whole thing for next person and its set of overlapped ranges (don't forget to reuse the same array). = go back to step 2.

And that's it basically.

A 10 years given date range requires an array of approx. 3650 nullable integers, which I think is rather small memory footprint (each integer taking 4 bytes, but I don't know how much space occupies a nullable integer that has an int and bool but lets assume 8 bytes which totals at 3650*8 = 28.52k) and can be easily and rather fast manipulate in memory. Since I'm not saving date ranges, splitting or anything similar these are barely just assignment operations with an if that checks whether value has already been set.

A 10 year date range is a rare exaggeratet extreme. 75% of date ranges will be within 3 months or quarter of a year (90 days * 8 bytes = 720 bytes) and 99% will fall in a range of a whole year (365*8 = 2920 bytes = 2,85k)

I find this algorithm more than appropriate for flattening overlapped date ranges.

To half memory footprint I could use int instead of int? and set to -1 instead of null.

A premature iteration loop break possibility

I could as well keep a count of days that aren't set and when it reaches 0 I can easily break the loop, because all remaining ranges are fully overlapped hence they wouldn't set any more values in array. So this would even speed things up a bit when I would have lots of range records (which will be rather rare).

142

answered Sep 29 '22 09:09

Robert Koritnik

Related questions
                            
                                create resource file programmatically
                            
                                How to execute an executable embedded as resource
                            
                                URL Rewriting in .Net MVC
                            
                                Create generic delegate using reflection
                            
                                XmlSerializer throws exception when serializing dynamically loaded type
                            
                                What is better in WPF for UI layout, using one Grid, or nested Grids
                            
                                How to Write to a User.Config file through ConfigurationManager?
                            
                                Overlaying several CLR reference fields with each other in explicit struct?
                            
                                Best way to translate from IDictionary to a generic IDictionary
                            
                                Am I using IRepository correctly?
                            
                                How do I print a PCL file in C#?
                            
                                Ria Services Passing Complex Object as parameter to a query domain service method
                            
                                Can't add null to list of nullables [duplicate]
                            
                                XmlSerializer and List<T> with default values
                            
                                .NET events special methods (add/remove/raise/other)
                            
                                How do I compare two PropertyInfos or methods reliably?
                            
                                In C#, best way to check if stringbuilder contains a substring
                            
                                More trivia than really important: Why no new() constraint on Activator.CreateInstance<T>()?
                            
                                How do I maintain RichText formatting (bold/italic/etc) when changing any one element?
                            
                                How to get the CPU Usage in asp.net

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Fastest way to split overlapping date ranges

Tags:

c#

split

date-range

overlap

Conditions

So for a given date range (not longer than ie. 5 years) I have to

My take on it

Question

Example data

Robert Koritnik

People also ask

1 Answers

How about an array of nullable integers

A premature iteration loop break possibility

Robert Koritnik

Recent Activity

Donate For Us