Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Joining together consecutive date validity intervals

I have a series of records containing some information (product type) with temporal validity.

I would like to meld together adjacent validity intervals, provided that the grouping information (the product type) stays the same. I cannot use a simple GROUP BY with MIN and MAX, because some product types (A, in the example) can "go away" and "come back".

Using Oracle 11g.

A similar question for MySQL is: How can I do a contiguous group by in MySQL?

Input data:

| PRODUCT |                       START_DATE |                         END_DATE |
|---------|----------------------------------|----------------------------------|
|       A |      July, 01 2013 00:00:00+0000 |      July, 31 2013 00:00:00+0000 |
|       A |    August, 01 2013 00:00:00+0000 |    August, 31 2013 00:00:00+0000 |
|       A | September, 01 2013 00:00:00+0000 | September, 30 2013 00:00:00+0000 |
|       B |   October, 01 2013 00:00:00+0000 |   October, 31 2013 00:00:00+0000 |
|       B |  November, 01 2013 00:00:00+0000 |  November, 30 2013 00:00:00+0000 |
|       A |  December, 01 2013 00:00:00+0000 |  December, 31 2013 00:00:00+0000 |
|       A |   January, 01 2014 00:00:00+0000 |   January, 31 2014 00:00:00+0000 |
|       A |  February, 01 2014 00:00:00+0000 |  February, 28 2014 00:00:00+0000 |
|       A |     March, 01 2014 00:00:00+0000 |     March, 31 2014 00:00:00+0000 |

Expected results:

| PRODUCT |                      START_DATE |                         END_DATE |
|---------|---------------------------------|----------------------------------|
|       A |     July, 01 2013 00:00:00+0000 | September, 30 2013 00:00:00+0000 |
|       B |  October, 01 2013 00:00:00+0000 |  November, 30 2013 00:00:00+0000 |
|       A | December, 01 2013 00:00:00+0000 |     March, 31 2014 00:00:00+0000 |

See the complete SQL Fiddle.

like image 342
Danilo Piazzalunga Avatar asked Mar 21 '23 10:03

Danilo Piazzalunga


2 Answers

This is a gaps-and-islands problem. There are various ways to approach it; this uses lead and lag analytic functions:

select distinct product,
  case when start_date is null then lag(start_date)
    over (partition by product order by rn) else start_date end as start_date,
  case when end_date is null then lead(end_date)
    over (partition by product order by rn) else end_date end as end_date
from (
  select product, start_date, end_date, rn
  from (
    select t.product,
      case when lag(end_date)
          over (partition by product order by start_date) is null
        or lag(end_date)
          over (partition by product order by start_date) != start_date - 1
        then start_date end as start_date,
      case when lead(start_date)
          over (partition by product order by start_date) is null
        or lead(start_date)
          over (partition by product order by start_date) != end_date + 1
        then end_date end as end_date,
      row_number() over (partition by product order by start_date) as rn
    from t
  )
  where start_date is not null or end_date is not null
)
order by start_date, product;

PRODUCT START_DATE END_DATE
------- ---------- ---------
A       01-JUL-13  30-SEP-13 
B       01-OCT-13  30-NOV-13 
A       01-DEC-13  31-MAR-14 

SQL Fiddle

The innermost query looks at the preceding and following records for the product, and only retains the start and/or end time if the records are not contiguous:

select t.product,
  case when lag(end_date)
      over (partition by product order by start_date) is null
    or lag(end_date)
      over (partition by product order by start_date) != start_date - 1
    then start_date end as start_date,
  case when lead(start_date)
      over (partition by product order by start_date) is null
    or lead(start_date)
      over (partition by product order by start_date) != end_date + 1
    then end_date end as end_date
from t;

PRODUCT START_DATE END_DATE
------- ---------- ---------
A       01-JUL-13            
A                            
A                  30-SEP-13 
A       01-DEC-13            
A                            
A                            
A                  31-MAR-14 
B       01-OCT-13            
B                  30-NOV-13 

The next level of select removes those which are mid-period, where both dates were blanked by the inner query, which gives:

PRODUCT START_DATE END_DATE
------- ---------- ---------
A       01-JUL-13            
A                  30-SEP-13 
A       01-DEC-13            
A                  31-MAR-14 
B       01-OCT-13            
B                  30-NOV-13 

The outer query then collapses those adjacent pairs; I've used the easy route of creating duplicates and then eliminating them with distinct, but you can do it other ways, like putting both values into one of the pairs of rows and leaving both values in the other null, and then eliminating those with another layer of select, but I think distinct is OK here.

If your real-world use case has times, not just dates, then you'll need to adjust the comparison in the inner query; rather than +/- 1, an interval of 1 second perhaps, or 1/86400 if you prefer, but depends on the precision of your values.

like image 174
Alex Poole Avatar answered Mar 30 '23 00:03

Alex Poole


It seems like there should be an easier way, but a combination of an analytical query (to find the different gaps) and a hierarchical query (to connect the rows that are continuous) works:

with data as (
    select 'A' product, to_date('7/1/2013', 'MM/DD/YYYY') start_date, to_date('7/31/2013', 'MM/DD/YYYY') end_date from dual union all
    select 'A' product, to_date('8/1/2013', 'MM/DD/YYYY') start_date, to_date('8/31/2013', 'MM/DD/YYYY') end_date from dual union all
    select 'A' product, to_date('9/1/2013', 'MM/DD/YYYY') start_date, to_date('9/30/2013', 'MM/DD/YYYY') end_date from dual union all
    select 'B' product, to_date('10/1/2013', 'MM/DD/YYYY') start_date, to_date('10/31/2013', 'MM/DD/YYYY') end_date from dual union all
    select 'B' product, to_date('11/1/2013', 'MM/DD/YYYY') start_date, to_date('11/30/2013', 'MM/DD/YYYY') end_date from dual union all
    select 'A' product, to_date('12/1/2013', 'MM/DD/YYYY') start_date, to_date('12/31/2013', 'MM/DD/YYYY') end_date from dual union all
    select 'A' product, to_date('1/1/2014', 'MM/DD/YYYY') start_date, to_date('1/31/2014', 'MM/DD/YYYY') end_date from dual union all
    select 'A' product, to_date('2/1/2014', 'MM/DD/YYYY') start_date, to_date('2/28/2014', 'MM/DD/YYYY') end_date from dual union all
    select 'A' product, to_date('3/1/2014', 'MM/DD/YYYY') start_date, to_date('3/31/2014', 'MM/DD/YYYY') end_date from dual
),
start_points as
(
    select product, start_date, end_date, prior_end+1, case when prior_end + 1 = start_date then null else 'Y' end start_point 
    from (
        select product, start_date, end_date, lag(end_date,1) over (partition by product order by end_date) prior_end
        from data
    )
)
select product, min(start_date) start_date, max(end_date) end_date
from (
    select product, start_date, end_date, level, connect_by_root(start_date) root_start
    from start_points
    start with start_point = 'Y'
    connect by prior end_date = start_date - 1
    and prior product = product
)
group by product, root_start;



PRODUCT START_DATE END_DATE 
------- ---------- ---------
A       01-JUL-13  30-SEP-13
A       01-DEC-13  31-MAR-14
B       01-OCT-13  30-NOV-13
like image 20
Craig Avatar answered Mar 29 '23 23:03

Craig