Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

algorithm challenge: merging date range

I'm facing an interesting problem:

  • I have several date ranges that can overlap
  • each of them has a name

Is it possible to "des-overlap" theses ranges? That is, to generate:

  • a new set of ranges where none overlaps the others
  • each of this new range has a list of corresponding names

Maybe I can make this a bit more graphical. This is what I have first:

a   |------------------------------|
b                    |-------------------|
c          |-----------------|

This is what I would like to obtain:

    |------|---------|-------|-----|-----|
        a      a,c     a,b,c   a,b    b

I found a solution that kind of works, but which is not elegant:

  1. I transform each range (from, to) into a list of days (d1, d2, d3, etc.)
  2. I group names by day
  3. I aggregate groups that contain the same names to recreate ranges

Can you think of a better solution? I'm working with C# but any language independent idea would be much appreciated. Thanks!

like image 575
b. austen Avatar asked Jul 01 '10 07:07

b. austen


4 Answers

I would

  1. Keep an unordered list of "open" ranges
  2. Start from day 1, and add the first range to the open ranges list.
  3. Move until the next range boundary (be it start or close). Create your first "output" range: starting from day 1, ending on that day. In it are the items in your open ranges list.
  4. If the range encountered is already in the open ranges list, remove it. Otherwise, add it.
  5. Repeat 3 and 4, moving along the line.

I definitely did a bad job of explaining it. I'll be writing some code for this soon. But until then, here's what would happen in your solution:

a   |------------------------------|
b                    |-------------------|
c          |-----------------|
1.  Start at day 1, add A to open ranges list, which now contains [A]
2.  Move to the start of C.  First RESULT RANGE: A range from Day 1 to C's start,
      with a value A. (what is in your open ranges list)
3.  Add C to the open ranges list, which now contains [A,C]
4.  Move to the start of B.  Second RESULT RANGE: A range from C's start to B's
      start, with a value A,C. (what is in your open ranges list)
5.  Add B to the open ranges list, which now contains [A,C,B]
6.  Move to the end of C.  Third RESULT RANGE: A range from B's start to C's end,
      with a value of A,C,B.
7.  Remove C from the open ranges list, which now contains [A,B]
8.  Move to the end of A.  Fourth RESULT RANGE: A range from C's end to A's end,
      with a value of A,B
9.  Remove A from the open ranges list, which now contains [B]
10. Move to the end of A.  Fourth RESULT RANGE: A range from A's end to B's end,
      with a value of B

RESULT RANGES
1. from Day 1 to C's start: A
2. from C's start to B's start: A,C
3. from B's start to C's end: A,C,B
4. from C's end to A's end: A,B
5. from A's end to B's end: B

Alternative Method

You could do this:

  1. Keep a ordered list of "output ranges". This makes it easy to test if a point is within a range, and also what ranges follow eachother.
  2. Take a range from your input ranges.
  3. Check to see if it is completely before or completely after all output ranges, and handle accordingly. Skip the next steps and go back to step 2, if so.
  4. Compare its start point to the output ranges
  5. If it is before any other output range, add a new output range from its start to the start of the first output range.
  6. If this is in between an already-existing output range, split that output range at that point. The first part would hold the same "parents", and the second part would have the same parents + the new input range.
  7. Now, compare its end point to the output ranges.
  8. If it is after any other output range, add a new output range from the end of the last output range to its end.
  9. If this is in between an already-existing output range, split that output range at that point. The second part would hold the same "parents", and the first part would have the same parents + the new input range
  10. Add the current input range as a part to all ranges in between the two "processed" ranges in steps 6 and 9, if any.
  11. Repeat 2-6 for all input ranges.

Here are the first few steps, using the sample data + another range D: ("processed" ranges indicated by **double stars**)

a   |------------------------------|
b                    |-------------------|
c          |-----------------|
d       |------------------------|

1.
   Process A
   Output ranges: |--------------A---------------|
2.
   Process B
     - Start of B is in **A**; split A in two:
                  |-------A-------|------AB------|
     - End of B is after any output range range;
         creating new output range at end
                  |-------A-------|------AB------|---B---|
    - Nothing was/is in between **A** and (nothing)
3.
   Process C
     - Start of C is in **A**; split A in two:
                  |---A----|--AC--|------AB------|---B---|
     - End of C is in **AB**; split AB in two:
                  |---A----|--AC--|--ABC--|--AB--|---B---|
     - There were/are no ranges between **A** and **AB**

4.
   Process D
     - Start of D is in **A**; split A in two:
                  |-A-|-AD-|--AC--|--ABC--|--AB--|---B---|
     - End of D is in **AB**; split AB in two:
                  |-A-|-AD-|--AC--|--ABC--|ABD|AB|---B---|
     - Ranges AC and ABC were/are in between **A** and **AB**
                  |-A-|-AD-|--ACD-|-ABCD--|ABD|AB|---B---|

Final output:     |-A-|-AD-|--ACD-|-ABCD--|ABD|AB|---B---|
like image 89
Justin L. Avatar answered Nov 19 '22 18:11

Justin L.


You may want to have look at Interval Trees.

like image 44
sfussenegger Avatar answered Nov 19 '22 17:11

sfussenegger


I have code that does this. It relies on the input-set being ordered by from and then to (ie. if it was SQL, it would be ORDER BY from_value, to_value), but after that it is quite optimal.

My implementation basically does what @Justin L.'s answer describes, so if you just want a textual description, look at his answer for the algorithm.

The code is here: LVK.DataStructures, and the files you want to look at are:

  • Range.cs
  • RangeExtensions.cs, in particular the Slice method at line 206 and onwards.

To use it:

List<Range<DateTime>> ranges = ...
var slices = ranges.Slice();

This will give you one new range for each slice, and for each such range you would have a .Data property containing references back to the contributing ranges.

ie. for your original example, you would get exactly what you want, individual ranges, with the contributing ranges a, b, c etc. in the .Data property.

The classes might use other portions of my class library, which is all there. If you decide to use it, just copy out the portions you want to use.

If you're only interested in the implementation, it can be found in the RangeExtensions.cs file, line 447 and onwards, InternalSlice method.

like image 2
Lasse V. Karlsen Avatar answered Nov 19 '22 17:11

Lasse V. Karlsen


You could:

  1. sort the list of all dates (combining the from and to dates)
  2. then running along this list, each new range would be from one date to the next start or end date that is different from the preceding.

For naming the new ranges, it would make sense to have the list of original range names that the current new range contains, and each time you hit an end date, remove the old range name from the list, and each tiem you hit a start date, add it's name to the list.

like image 1
Frank Avatar answered Nov 19 '22 18:11

Frank