Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Complex SQL query - collapsing date ranges

Tags:

sql

oracle

We have a table that lists start and end dates for "service time". I would like a query that can search through all of the rows and identify any breaks in service based on gaps in the dates.

Data:
Start         End
1/1/2000   2/1/2001
2/2/2001   4/1/2001
4/1/2004   6/2/2006
6/3/2006   9/1/2010
8/1/2011   9/1/2012

Desired result:
1/1/2001 - 4/1/2001     //The first two ranges collapsed because no break in service 
4/1/2004 - 9/1/2010     // The 3rd and 4th rows collapsed because no real break in service 
8/1/2011 - 9/1/2012  

This probably more easily done in app logic or stored proc, just wondering if there is any SQL voodoo that could get me close.

Table definition:
CREATE TABLE CONG_MEMBER_TERM 
(
  CONG_MEMBER_TERM_ID NUMBER(10, 0) NOT NULL 
  , CONGRESS_ID NUMBER(10, 0) NOT NULL 
  , CHAMBER_CD VARCHAR2(30 BYTE) NOT NULL 
  , CONG_MEMBER_ID NUMBER(10, 0) NOT NULL 
  , STATE_CD CHAR(2 BYTE) NOT NULL 
  , CONG_MEMBER_TYPE_CD VARCHAR2(30 BYTE) NOT NULL 
  , DISTRICT NUMBER(10, 0) 
  , START_DT TIMESTAMP(6) WITH TIME ZONE 
  , END_DT TIMESTAMP(6) WITH TIME ZONE 
  , CREATE_DT TIMESTAMP(6) WITH TIME ZONE NOT NULL 
  , UPDATE_DT TIMESTAMP(6) WITH TIME ZONE NOT NULL
)

Insert into CONG_MEMBER_TERM (CONG_MEMBER_TERM_ID,CONGRESS_ID,CHAMBER_CD,CONG_MEMBER_ID,STATE_CD,CONG_MEMBER_TYPE_CD,DISTRICT,START_DT,END_DT,CREATE_DT,UPDATE_DT) values (2945,104,'H',494,'OK','REP',2,to_timestamp_tz('04-JAN-95 01.00.00.000000000 AM -05:00','DD-MON-RR HH.MI.SS.FF AM TZR'),to_timestamp_tz('04-OCT-96 01.00.00.000000000 AM -05:00','DD-MON-RR HH.MI.SS.FF AM TZR'),to_timestamp_tz('02-MAY-12 09.45.47.000000000 AM -05:00','DD-MON-RR HH.MI.SS.FF AM TZR'),to_timestamp_tz('02-MAY-12 09.45.48.000000000 AM -05:00','DD-MON-RR HH.MI.SS.FF AM TZR'));
Insert into CONG_MEMBER_TERM (CONG_MEMBER_TERM_ID,CONGRESS_ID,CHAMBER_CD,CONG_MEMBER_ID,STATE_CD,CONG_MEMBER_TYPE_CD,DISTRICT,START_DT,END_DT,CREATE_DT,UPDATE_DT) values (2946,105,'H',494,'OK','REP',2,to_timestamp_tz('07-JAN-97 01.00.00.000000000 AM -05:00','DD-MON-RR HH.MI.SS.FF AM TZR'),to_timestamp_tz('19-DEC-98 01.00.00.000000000 AM -05:00','DD-MON-RR HH.MI.SS.FF AM TZR'),to_timestamp_tz('02-MAY-12 09.45.47.000000000 AM -05:00','DD-MON-RR HH.MI.SS.FF AM TZR'),to_timestamp_tz('02-MAY-12 09.45.49.000000000 AM -05:00','DD-MON-RR HH.MI.SS.FF AM TZR'));
Insert into CONG_MEMBER_TERM (CONG_MEMBER_TERM_ID,CONGRESS_ID,CHAMBER_CD,CONG_MEMBER_ID,STATE_CD,CONG_MEMBER_TYPE_CD,DISTRICT,START_DT,END_DT,CREATE_DT,UPDATE_DT) values (2947,106,'H',494,'OK','REP',2,to_timestamp_tz('06-JAN-99 01.00.00.000000000 AM -05:00','DD-MON-RR HH.MI.SS.FF AM TZR'),to_timestamp_tz('15-DEC-00 01.00.00.000000000 AM -05:00','DD-MON-RR HH.MI.SS.FF AM TZR'),to_timestamp_tz('02-MAY-12 09.45.47.000000000 AM -05:00','DD-MON-RR HH.MI.SS.FF AM TZR'),to_timestamp_tz('02-MAY-12 09.45.49.000000000 AM -05:00','DD-MON-RR HH.MI.SS.FF AM TZR'));
Insert into CONG_MEMBER_TERM (CONG_MEMBER_TERM_ID,CONGRESS_ID,CHAMBER_CD,CONG_MEMBER_ID,STATE_CD,CONG_MEMBER_TYPE_CD,DISTRICT,START_DT,END_DT,CREATE_DT,UPDATE_DT) values (2948,109,'S',494,'OK','SEN',null,to_timestamp_tz('04-JAN-05 01.00.00.000000000 AM -05:00','DD-MON-RR HH.MI.SS.FF AM TZR'),to_timestamp_tz('09-DEC-06 01.00.00.000000000 AM -05:00','DD-MON-RR HH.MI.SS.FF AM TZR'),to_timestamp_tz('02-MAY-12 09.45.48.000000000 AM -05:00','DD-MON-RR HH.MI.SS.FF AM TZR'),to_timestamp_tz('02-MAY-12 09.45.49.000000000 AM -05:00','DD-MON-RR HH.MI.SS.FF AM TZR'));
Insert into CONG_MEMBER_TERM (CONG_MEMBER_TERM_ID,CONGRESS_ID,CHAMBER_CD,CONG_MEMBER_ID,STATE_CD,CONG_MEMBER_TYPE_CD,DISTRICT,START_DT,END_DT,CREATE_DT,UPDATE_DT) values (2949,110,'S',494,'OK','SEN',null,to_timestamp_tz('04-JAN-07 01.00.00.000000000 AM -05:00','DD-MON-RR HH.MI.SS.FF AM TZR'),to_timestamp_tz('02-JAN-09 01.00.00.000000000 AM -05:00','DD-MON-RR HH.MI.SS.FF AM TZR'),to_timestamp_tz('02-MAY-12 09.45.48.000000000 AM -05:00','DD-MON-RR HH.MI.SS.FF AM TZR'),to_timestamp_tz('02-MAY-12 09.45.49.000000000 AM -05:00','DD-MON-RR HH.MI.SS.FF AM TZR'));
Insert into CONG_MEMBER_TERM (CONG_MEMBER_TERM_ID,CONGRESS_ID,CHAMBER_CD,CONG_MEMBER_ID,STATE_CD,CONG_MEMBER_TYPE_CD,DISTRICT,START_DT,END_DT,CREATE_DT,UPDATE_DT) values (2951,111,'S',494,'OK','SEN',null,to_timestamp_tz('06-JAN-09 01.00.00.000000000 AM -05:00','DD-MON-RR HH.MI.SS.FF AM TZR'),to_timestamp_tz('22-DEC-10 01.00.00.000000000 AM -05:00','DD-MON-RR HH.MI.SS.FF AM TZR'),to_timestamp_tz('02-MAY-12 09.45.48.000000000 AM -05:00','DD-MON-RR HH.MI.SS.FF AM TZR'),to_timestamp_tz('02-MAY-12 09.45.49.000000000 AM -05:00','DD-MON-RR HH.MI.SS.FF AM TZR'));
Insert into CONG_MEMBER_TERM (CONG_MEMBER_TERM_ID,CONGRESS_ID,CHAMBER_CD,CONG_MEMBER_ID,STATE_CD,CONG_MEMBER_TYPE_CD,DISTRICT,START_DT,END_DT,CREATE_DT,UPDATE_DT) values (2950,112,'S',494,'OK','SEN',null,to_timestamp_tz('05-JAN-11 01.00.00.000000000 AM -05:00','DD-MON-RR HH.MI.SS.FF AM TZR'),null,to_timestamp_tz('02-MAY-12 09.45.48.000000000 AM -05:00','DD-MON-RR HH.MI.SS.FF AM TZR'),to_timestamp_tz('02-MAY-12 09.45.49.000000000 AM -05:00','DD-MON-RR HH.MI.SS.FF AM TZR'));

If the gap between the previous service and the next service is greater then 24 months, it's considered a "gap" in service.

_mike

like image 753
nibeck Avatar asked Oct 07 '22 22:10

nibeck


2 Answers

Here is a standard way to do collapsing time ranges in SQL using analytic functions with an example.

Your table:

SQL> create table mytable (startdate,enddate)
  2  as
  3  select date '2000-01-01', date '2001-02-01' from dual union all
  4  select date '2001-02-02', date '2001-04-01' from dual union all
  5  select date '2004-04-01', date '2006-06-02' from dual union all
  6  select date '2006-06-03', date '2010-09-01' from dual union all
  7  select date '2011-08-01', date '2012-09-01' from dual
  8  /

Table created.

The query:

SQL> select min(startdate) startdate
  2       , max(enddate)   enddate
  3    from ( select startdate
  4                , enddate
  5                , max(rn) over (order by startdate) maxrn
  6             from ( select startdate
  7                         , enddate
  8                         , case lag(enddate) over (order by startdate)
  9                           when startdate-1 then
 10                             null
 11                           else
 12                             rownum
 13                           end rn
 14                      from mytable
 15                  )
 16         )
 17   group by maxrn
 18   order by startdate
 19  /

STARTDATE           ENDDATE
------------------- -------------------
01-01-2000 00:00:00 01-04-2001 00:00:00
01-04-2004 00:00:00 01-09-2010 00:00:00
01-08-2011 00:00:00 01-09-2012 00:00:00

3 rows selected.

It works in three phases:

  1. assign a unique rownum to only those records that are the start of a group
  2. give the records that are not the start of a group the same number as the start of the group (using analytic function MAX with a sliding window)
  3. aggregate by group number

And the real beauty of this query is that only one TABLE ACCESS FULL is needed:

SQL> select * from table(dbms_xplan.display_cursor(null,null,'allstats last'))
  2  /

PLAN_TABLE_OUTPUT
-------------------------------------------------------------------------------------------------------------------------
SQL_ID  8v1suw8j53tqz, child number 0
-------------------------------------
select min(startdate) startdate      , max(enddate)   enddate   from ( select startdate               , enddate
             , max(rn) over (order by startdate) maxrn            from ( select startdate
, enddate                        , case lag(enddate) over (order by startdate)                          when
startdate-1 then                            null                          else                            rownum
                         end rn                     from mytable                 )        )  group by maxrn
order by startdate

Plan hash value: 2933657513

-------------------------------------------------------------------------------------------------------------------------
| Id  | Operation                | Name    | Starts | E-Rows | A-Rows |   A-Time   | Buffers |  OMem |  1Mem | Used-Mem |
-------------------------------------------------------------------------------------------------------------------------
|   1 |  SORT ORDER BY           |         |      1 |      5 |      3 |00:00:00.01 |       3 |  2048 |  2048 | 2048  (0)|
|   2 |   HASH GROUP BY          |         |      1 |      5 |      3 |00:00:00.01 |       3 |       |       |          |
|   3 |    VIEW                  |         |      1 |      5 |      5 |00:00:00.01 |       3 |       |       |          |
|   4 |     WINDOW BUFFER        |         |      1 |      5 |      5 |00:00:00.01 |       3 |  2048 |  2048 | 2048  (0)|
|   5 |      VIEW                |         |      1 |      5 |      5 |00:00:00.01 |       3 |       |       |          |
|   6 |       WINDOW SORT        |         |      1 |      5 |      5 |00:00:00.01 |       3 |  2048 |  2048 | 2048  (0)|
|   7 |        COUNT             |         |      1 |        |      5 |00:00:00.01 |       3 |       |       |          |
|   8 |         TABLE ACCESS FULL| MYTABLE |      1 |      5 |      5 |00:00:00.01 |       3 |       |       |          |
-------------------------------------------------------------------------------------------------------------------------


24 rows selected.

Regards,
Rob.

like image 128
Rob van Wijk Avatar answered Oct 12 '22 11:10

Rob van Wijk


This can be done using analytics, but I am not sure what you want to come out of the query.

For instance:

drop table mydates;

create table mydates (sdate date, edate date);

insert into mydates values (to_date('2000-01-01' ,'YYYY-MM-DD'), to_date('2000-01-02' ,'YYYY-MM-DD'));

insert into mydates values (to_date('2000-01-02' ,'YYYY-MM-DD'), to_date('2000-02-01' ,'YYYY-MM-DD'));
-- insert a gap
insert into mydates values (to_date('2001-01-01' ,'YYYY-MM-DD'), to_date('2001-01-02' ,'YYYY-MM-DD'));

insert into mydates values (to_date('2001-01-02' ,'YYYY-MM-DD'), to_date('2001-02-01' ,'YYYY-MM-DD'));

Here we have two groups of two rows with a break in service between them. You can find all the rows where the start_date is different to the previous rows end_date easily enough using the lag function. This SQL will give you the start row of each group:

select * 
  from 
  (
    select lag(edate, 1, null) over (order by sdate asc)  as previous_end,
    sdate sd,
    edate ed
    from mydates
  )
where previous_end != sd or previous_end is null;

But I am not sure that is what you want. You can get fancier and collapse all the rows down so you get out one row per continuous group.

It may be useful if you post a more complete example (including the scripts to create the objects and data).

select min(sd) sd, max(ed) ed
from
(
  select max(grp) over (order by sd) grp,
         sd, ed
  from  
  (
    select 
      case 
        when previous_end != sd or previous_end is null then
          rn
        else
          null
      end grp,
      sd, 
      ed
    from 
    (
      select lag(edate, 1, null) over (order by sdate asc)  as previous_end,
      row_number() over (order by sdate asc) as rn,
      sdate sd,
      edate ed
      from mydates
      order by sdate asc
    )
  )
) group by grp
order by sd asc;
like image 38
Stephen ODonnell Avatar answered Oct 12 '22 10:10

Stephen ODonnell