Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Combine continuous datetime intervals by type

Say we have such a table:

declare @periods table (
    s date, 
    e date,
    t tinyint
);

with date intervals without gaps ordered by start date (s)

insert into @periods values
('2013-01-01' , '2013-01-02', 3),
('2013-01-02' , '2013-01-04', 1),
('2013-01-04' , '2013-01-05', 1),
('2013-01-05' , '2013-01-06', 2),
('2013-01-06' , '2013-01-07', 2),
('2013-01-07' , '2013-01-08', 2),
('2013-01-08' , '2013-01-09', 1);

All date intervals have different types (t).

It is required to combine date intervals of the same type where they are not broken by intervals of the other types (having all intervals ordered by start date).

So the result table should look like:

      s     |      e     |  t
------------|------------|-----
 2013-01-01 | 2013-01-02 |  3
 2013-01-02 | 2013-01-05 |  1
 2013-01-05 | 2013-01-08 |  2
 2013-01-08 | 2013-01-09 |  1

Any ideas how to do this without cursor?


I've got one working solution:

declare @periods table (
    s datetime primary key clustered, 
    e datetime,
    t tinyint,
    period_number int   
);

insert into @periods (s, e, t) values
('2013-01-01' , '2013-01-02', 3),
('2013-01-02' , '2013-01-04', 1),
('2013-01-04' , '2013-01-05', 1),
('2013-01-05' , '2013-01-06', 2),
('2013-01-06' , '2013-01-07', 2),
('2013-01-07' , '2013-01-08', 2),
('2013-01-08' , '2013-01-09', 1);

declare @t tinyint = null;  
declare @PeriodNumber int = 0;
declare @anchor date;

update @periods
    set  period_number = @PeriodNumber, 
    @PeriodNumber = case
                        when @t <> t
                            then  @PeriodNumber + 1
                        else
                            @PeriodNumber
                    end,
    @t = t,
    @anchor = s
option (maxdop 1);

select 
    s = min(s),
    e = max(e),
    t = min(t)
from 
    @periods    
group by 
    period_number
order by 
    s;

but I doubt if I can rely on such a behavior of UPDATE statement?

I use SQL Server 2008 R2.


Edit:

Thanks to Daniel and this article: http://www.sqlservercentral.com/articles/T-SQL/68467/

I found three important things that were missed in the solution above:

  1. There must be clustered index on the table
  2. There must be anchor variable and call of the clustered column
  3. Update statement should be executed by one processor, i.e. without parallelism

I've changed the above solution in accordance with these rules.

like image 296
Vadim Loboda Avatar asked Feb 19 '13 16:02

Vadim Loboda


2 Answers

Since your ranges are continuous, the problem essentially becomes a gaps-and-islands one. If only you had a criterion to help you to distinguish between different sequences with the same t value, you could group all the rows using that criterion, then just take MIN(s), MAX(e) for every group.

One method of obtaining such a criterion is to use two ROW_NUMBER calls. Consider the following query:

SELECT
  *,
  rnk1 = ROW_NUMBER() OVER (               ORDER BY s),
  rnk2 = ROW_NUMBER() OVER (PARTITION BY t ORDER BY s)
FROM @periods
;

For your example it would return the following set:

s           e           t   rnk1  rnk2
----------  ----------  --  ----  ----
2013-01-01  2013-01-02  3   1     1
2013-01-02  2013-01-04  1   2     1
2013-01-04  2013-01-05  1   3     2
2013-01-05  2013-01-06  2   4     1
2013-01-06  2013-01-07  2   5     2
2013-01-07  2013-01-08  2   6     3
2013-01-08  2013-01-09  1   7     3

The interesting thing about the rnk1 and rnk2 rankings is that if you subtract one from the other, you will get values that, together with t, uniquely identify every distinct sequence of rows with the same t:

s           e           t   rnk1  rnk2  rnk1 - rnk2
----------  ----------  --  ----  ----  -----------
2013-01-01  2013-01-02  3   1     1     0
2013-01-02  2013-01-04  1   2     1     1
2013-01-04  2013-01-05  1   3     2     1
2013-01-05  2013-01-06  2   4     1     3
2013-01-06  2013-01-07  2   5     2     3
2013-01-07  2013-01-08  2   6     3     3
2013-01-08  2013-01-09  1   7     3     4

Knowing that, you can easily apply grouping and aggregation. This is what the final query might look like:

WITH partitioned AS (
  SELECT
    *,
    g = ROW_NUMBER() OVER (               ORDER BY s)
      - ROW_NUMBER() OVER (PARTITION BY t ORDER BY s)
  FROM @periods
)
SELECT
  s = MIN(s),
  e = MAX(e),
  t
FROM partitioned
GROUP BY
  t,
  g
;

If you like, you can play with this solution at SQL Fiddle.

like image 133
Andriy M Avatar answered Oct 21 '22 07:10

Andriy M


How about this?

declare @periods table (
    s datetime primary key, 
    e datetime,
    t tinyint,
    s2 datetime
);

insert into @periods (s, e, t) values
('2013-01-01' , '2013-01-02', 3),
('2013-01-02' , '2013-01-04', 1),
('2013-01-04' , '2013-01-05', 1),
('2013-01-05' , '2013-01-06', 2),
('2013-01-06' , '2013-01-07', 2),
('2013-01-07' , '2013-01-08', 2),
('2013-01-08' , '2013-01-09', 1);

update @periods set s2 = s;

while @@ROWCOUNT > 0
begin
    update p2 SET s2=p1.s
    from @periods p1
    join @PERIODS P2 ON p2.t = p1.t AND p2.s2 = p1.e;
end

select s2 as s, max(e) as e, min(t) as t
from @periods
group by s2
order by s2;
like image 23
GilM Avatar answered Oct 21 '22 06:10

GilM