Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Combine rows when the end time of one is the start time of another (Oracle)

I just can't seem to get this query figured out. I need to combine rows of time-consecutive states into a single state.

This question is similar to the question found here except I am working with Oracle 10 not SQL Server: Combine rows when the end time of one is the start time of another

Example data:

name      start_inst         end_inst            code     subcode
Person1 9/12/2011 10:55 9/12/2011 11:49           161   50
Person1 9/12/2011 11:49 9/12/2011 11:55           107   28
Person1 9/12/2011 11:55 9/12/2011 12:07           161   50
Person1 9/12/2011 12:07 9/12/2011 12:26           161   50
Person1 9/12/2011 12:26 9/12/2011 12:57           161   71
Person1 9/12/2011 12:57 9/12/2011 13:07           161   71
Person1 9/12/2011 13:07 9/12/2011 13:20            52   50

And I would like to get the following output:

name       start_inst       end_inst            code     subcode
Person1 9/12/2011 10:55     9/12/2011 11:49     161     50
Person1 9/12/2011 11:49     9/12/2011 11:55     107     28
Person1 9/12/2011 11:55     9/12/2011 12:26     161     50
Person1 9/12/2011 12:26     9/12/2011 13:07     161     71
Person1 9/12/2011 13:07     9/12/2011 13:20     52      50

Here is example SQL:

CREATE TABLE Data (
    name varchar2(132 BYTE) not null,
    start_inst DATE not null,
    end_inst DATE not null,    
code number(3) not null,
subcode number(3) not null
);
INSERT INTO Data(name,start_inst,end_inst, code, code2) VALUES('Person1','9/12/2011 10:55','9/12/2011 11:49',161, 50);
INSERT INTO Data(name,start_inst,end_inst, code, code2) VALUES('Person1','9/12/2011 11:49','9/12/2011 11:55',107,28);
INSERT INTO Data(name,start_inst,end_inst, code, code2) VALUES('Person1','9/12/2011 11:55','9/12/2011 12:07',161,50);
INSERT INTO Data(name,start_inst,end_inst, code, code2) VALUES('Person1','9/12/2011 12:07','9/12/2011 12:26',161,50);
INSERT INTO Data(name,start_inst,end_inst, code, code2) VALUES('Person1','9/12/2011 12:26','9/12/2011 12:57',161,71);
INSERT INTO Data(name,start_inst,end_inst, code, code2) VALUES('Person1','9/12/2011 12:57','9/12/2011 13:07',161,71);
INSERT INTO Data(name,start_inst,end_inst, code, code2) VALUES('Person1','9/12/2011 13:07','9/12/2011 13:20',52,50);

Thanks in advance!

like image 297
ScottCollier Avatar asked Sep 14 '11 17:09

ScottCollier


4 Answers

Maybe this? (I don't have a SQL machine to run it on)

WITH
  sequenced_data AS
(
  SELECT
    ROW_NUMBER() OVER (PARTITION BY name                ORDER BY start_inst) NameSequenceID,
    ROW_NUMBER() OVER (PARTITION BY name, code, subcode ORDER BY start_inst) NameStateSequenceID,
    *
  FROM
    data
)
SELECT
  name,
  MIN(start_inst) start_inst,
  MAX(end_inst)   end_inst,
  code,
  subcode
FROM
  sequenced_data
GROUP BY
  name,
  code,
  subcode,
  NameSequenceID - NameStateSequenceID
like image 138
MatBailie Avatar answered Nov 15 '22 08:11

MatBailie


Here is a solution using a recursive query instead of analytic functions (as suggested by @wildplasser):

SELECT   name, code, subcode, MIN(start_inst) AS start_inst, MAX(end_inst) AS end_inst
FROM     (SELECT     name,
                     start_inst,
                     end_inst,
                     code,
                     subcode,
                     MIN(CONNECT_BY_ROOT (start_inst)) AS root_start
          FROM       data d
          CONNECT BY PRIOR name = name 
                 AND PRIOR end_inst = start_inst 
                 AND PRIOR code = code 
                 AND PRIOR subcode = subcode
          GROUP BY   name, start_inst, end_inst, code, subcode)
GROUP BY name, code, subcode, root_start;

The connect by clause in the innermost query causes the data to be returned in a hierarchical fashion. connect_by_root gives us the value at the root of each branch. Because we don't have a good candidate for a start with clause, we'll get all child rows (where end_inst equals another row's start_inst and all other columns are the same) multiple times: once as a root and once (or more) as a branch. Taking the min of the root eliminates these extra rows while giving us a value to group on in the outer query.

In the outer query, we perform another group by to consolidate the rows. The difference is that, in this case, we have root_start there as well to identify which rows are consecutive and therefore need to be consolidated.

like image 25
Allan Avatar answered Nov 15 '22 08:11

Allan


Here's another approach:

SELECT
    name,
    min(start_inst) AS start_inst,
    max(end_inst) AS end_inst,
    code,
    subcode
FROM
    (
        SELECT
            A.*,
            COUNT
            (
                CASE WHEN start_inst = previous_end_inst THEN NULL
                ELSE 1
                END
            )
            OVER
            (
                ORDER BY
                    start_inst,
                    name,
                    code,
                    subcode
            ) AS group_number
        FROM
            (
                SELECT
                    name,
                    start_inst,
                    end_inst,
                    LAG
                    (
                      end_inst
                    )
                    OVER
                    (
                        PARTITION BY
                            name,
                            code,
                            subcode
                        ORDER BY
                            start_inst
                    ) AS previous_end_inst,
                    code,
                    subcode
                FROM
                    data
            ) A
        ) B
GROUP BY
    name,
    code,
    subcode,
    group_number
ORDER BY
    group_number

Basically:

  1. For each row, subquery A finds the previous end time for the given name, code, and subcode.

  2. For each row, subquery B calculates the "group number" -- a running count of preceeding rows (in order of start_inst, name, code, and subcode) where the previous end time calculated in Step 1 is not equal to the start time.

  3. The outer query aggregates by group number.

For better or worse, this approach, unlike @stevo's, will create a new "group" if there's a "gap" between the end time of one record and the start time of the next. For example, if you were to create a gap between 12:57 and 13:00 like this...

UPDATE data
SET start_inst = TO_DATE('9/12/2011 13:00', 'MM/DD/YYYY HH24:MI')
WHERE start_inst = TO_DATE('9/12/2011 12:57', 'MM/DD/YYYY HH24:MI');

...the query above would return two rows like this...

NAME                 START_INST       END_INST               CODE    SUBCODE
-------------------- ---------------- ---------------- ---------- ----------
.
.
.
Person1              09/12/2011 12:26 09/12/2011 12:57        161         71
Person1              09/12/2011 13:00 09/12/2011 13:07        161         71
.
.
.

...whereas @stevo's query would return one row like this...

NAME                 START_INST       END_INST               CODE    SUBCODE
-------------------- ---------------- ---------------- ---------- ----------
.
.
.
Person1              12/09/2011 12:26 12/09/2011 13:07        161         71
.
.
.

Hope this helps.

like image 23
Brian Camire Avatar answered Nov 15 '22 06:11

Brian Camire


adapting desm's query, I think this should work

WITH
  sequenced_data AS
(
SELECT
ROW_NUMBER() OVER (PARTITION BY name                ORDER BY start_inst) NameSequenceID,
ROW_NUMBER() OVER (PARTITION BY name, code, subcode ORDER BY start_inst)     NameStateSequenceID,
d.*
FROM
data d
) 
SELECT
  name,
  to_char(MIN(start_inst),'DD/MM/YYYY HH24:MI') start_inst,
  to_char(MAX(end_inst),'DD/MM/YYYY HH24:MI')   end_inst,
  code,
  subcode
FROM
  sequenced_data
GROUP BY
  name,
  code,
  subcode,
  NameSequenceID - NameStateSequenceID
ORDER BY name,start_inst
like image 21
steve godfrey Avatar answered Nov 15 '22 06:11

steve godfrey