Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Divide rows with date in SQL Server 2014

I have a problem with SQL. I have the following table:

declare @t table (START_DATE datetime,
                  END_DATE datetime, 
                  GROSS_SALES_PRICE decimal(10,2)
                 );

insert into @t 
values ('2014-08-06 00:00:00.000', '2014-10-06 23:59:59.000', 29.99),
       ('2014-09-06 00:00:00.000', '2014-09-09 23:59:59.000', 32.99),
       ('2014-09-10 00:00:00.000', '2014-09-30 23:59:59.000', 32.99),
       ('2014-10-07 00:00:00.000', '2049-12-31 23:59:59.000', 34.99)

I would like to separate the dates which overlaps. For example I have in the first row START_DATE 2014-08-06 and END_DATE 2014-10-06. We can see that the dates from the second and the third row are inside this period of time from first row.

So I would like to separate them as follows:

declare @t2 table (START_DATE datetime,
                   END_DATE datetime, 
                   GROSS_SALES_PRICE decimal(10,2)
                  );

insert into @t2 
values ('2014-08-06 00:00:00.000', '2014-09-05 23:59:59.000', 29.99),
       ('2014-09-06 00:00:00.000', '2014-09-09 23:59:59.000', 32.99),
       ('2014-09-10 00:00:00.000', '2014-09-30 23:59:59.000', 32.99),
       ('2014-10-01 00:00:00.000', '2014-10-06 23:59:59.000', 29.99),
       ('2014-10-07 00:00:00.000', '2049-12-31 23:59:59.000', 34.99)

So the second and the third rows remained unchanged. The first row should have new END_DATE. We also have new row. The GROSS_SALES_PRICE should remain as it is in internal period. Thanks for help. I am using SQL Server 2014

like image 975
Rafał Kobiela Avatar asked Jun 06 '17 14:06

Rafał Kobiela


2 Answers

A calendar/dates table can simplify this, but we can also use a query to generate a temporary dates table using a common table expression.

From there, we can solve this as a gaps and islands style problem. Using the dates table and using outer apply() to get the latest values for start_date and gross_sales_price we can identify the groups we want to re-aggregate by using two row_number()s. The first just ordered by date, less the other that is partitioned by the value we have as the latest start_date and ordered by date.

Then you can dump the results of the common table expression src to a temporary table and do your inserts/deletes using that or you can use merge using src.

/* -- dates --*/
declare @fromdate datetime, @thrudate datetime;
select  @fromdate = min(start_date), @thrudate = max(end_date) from #t;
;with n as (select n from (values(0),(1),(2),(3),(4),(5),(6),(7),(8),(9)) t(n))
, dates as (
  select top (datediff(day, @fromdate, @thrudate)+1) 
      [Date]=convert(datetime,dateadd(day,row_number() over(order by (select 1))-1,@fromdate))
    , [End_Date]=convert(datetime,dateadd(millisecond,-3,dateadd(day,row_number() over(order by (select 1)),@fromdate)))
  from n as deka cross join n as hecto cross join n as kilo
                 cross join n as tenK cross join n as hundredK
   order by [Date]
)
/* -- islands -- */
, cte as (
select 
    start_date = d.date
  , end_date   = d.end_date
  , x.gross_sales_price
  , grp = row_number() over (order by d.date)
        - row_number() over (partition by x.start_date order by d.date)
from dates d
  outer apply (
    select top 1 l.start_date, l.gross_sales_price
    from #t l
    where d.date >= l.start_date
      and d.date <= l.end_date
    order by l.start_date desc
    ) x
)
/* -- aggregated islands -- */
, src as (
select 
    start_date = min(start_date)
  , end_date   = max(end_date)  
  , gross_sales_price
from cte
group by gross_sales_price, grp
)
/* -- merge -- */
merge #t with (holdlock) as target
using src as source
  on target.start_date = source.start_date
 and target.end_date   = source.end_date
 and target.gross_sales_price = source.gross_sales_price
when not matched by target 
  then insert (start_date, end_date, gross_sales_price)
    values (start_date, end_date, gross_sales_price)
when not matched by source 
  then delete
output $action, inserted.*, deleted.*;
/* -- results -- */
select 
    start_date
  , end_date  
  , gross_sales_price
from #t 
order by start_date

rextester demo: http://rextester.com/MFXCQQ90933

merge output (you do not need to output this, just showing for the demo):

+---------+---------------------+---------------------+-------------------+---------------------+---------------------+-------------------+
| $action |     START_DATE      |      END_DATE       | GROSS_SALES_PRICE |     START_DATE      |      END_DATE       | GROSS_SALES_PRICE |
+---------+---------------------+---------------------+-------------------+---------------------+---------------------+-------------------+
| INSERT  | 2014-10-01 00:00:00 | 2014-10-06 23:59:59 | 29.99             | NULL                | NULL                | NULL              |
| INSERT  | 2014-08-06 00:00:00 | 2014-09-05 23:59:59 | 29.99             | NULL                | NULL                | NULL              |
| DELETE  | NULL                | NULL                | NULL              | 2014-08-06 00:00:00 | 2014-10-06 23:59:59 | 29.99             |
+---------+---------------------+---------------------+-------------------+---------------------+---------------------+-------------------+

results:

+-------------------------+-------------------------+-------------------+
|       start_date        |        end_date         | gross_sales_price |
+-------------------------+-------------------------+-------------------+
| 2014-08-06 00:00:00.000 | 2014-09-05 23:59:59.997 | 29.99             |
| 2014-09-06 00:00:00.000 | 2014-09-09 23:59:59.997 | 32.99             |
| 2014-09-10 00:00:00.000 | 2014-09-30 23:59:59.997 | 32.99             |
| 2014-10-01 00:00:00.000 | 2014-10-06 23:59:59.997 | 29.99             |
| 2014-10-07 00:00:00.000 | 2049-12-31 23:59:59.997 | 34.99             |
+-------------------------+-------------------------+-------------------+

calendar and numbers tables reference:

  • Generate a set or sequence without loops 2- Aaron Bertrand
  • Creating a Date Table/Dimension in SQL Server 2008 - David Stein
  • Calendar Tables - Why You Need One - David Stein
  • Creating a date dimension or calendar table in SQL Server - Aaron Bertrand

merge reference:

  • Use Caution with SQL Server''s MERGE Statement - Aaron Bertrand
  • UPSERT Race Condition With Merge - Dan Guzman
  • An Interesting MERGE Bug - Paul White
  • Can I optimize this merge statement - Aaron Bertrand
  • If you are using indexed views and MERGE, please read this! - Aaron Bertrand
  • The Case of the Blocking Merge Statement (LCK_M_RS_U locks) - Kendra Little
  • Writing t-sql merge statements the right way - David Stein
like image 186
SqlZim Avatar answered Oct 26 '22 12:10

SqlZim


In addition to using datetime2 type instead of datetime, I'd recommend you to use [Closed; Open) intervals instead of [Closed; Closed]. In other words, use 2014-08-06 00:00:00.000, 2014-09-06 00:00:00.000 instead of 2014-08-06 00:00:00.000, 2014-09-05 23:59:59.000. Specifically, because 59.999 will be rounded to 00.000 for the datetime type, but will not for datetime2(3). You don't want to depend on such internal details of the data types.

Also, [Closed; Open) intervals are much easier to deal with in the queries as you'll see below.

The main idea is to put all start and end dates (boundaries) together in one list with a flag that indicates whether it is a beginning or end of the interval. When a running total of the flag turns into zero, it means that all overlapping intervals have ended.

Sample data

I extended your sample data with several cases of overlapping intervals.

declare @t table 
(START_DATE datetime2(0),
END_DATE datetime2(0), 
GROSS_SALES_PRICE decimal(10,2)
);

insert into @t 
values 
-- |------|                                    11
('2001-01-01 00:00:00', '2001-01-10 00:00:00', 11),

-- |------|                                    10
--     |------|                                20
('2010-01-01 00:00:00', '2010-01-10 00:00:00', 10),
('2010-01-05 00:00:00', '2010-01-20 00:00:00', 20),

-- |----------|                                30
--     |------|                                40
('2010-02-01 00:00:00', '2010-02-20 00:00:00', 30),
('2010-02-05 00:00:00', '2010-02-20 00:00:00', 40),

-- |----------|                                50
-- |----------|                                60
('2010-03-01 00:00:00', '2010-03-20 00:00:00', 50),
('2010-03-01 00:00:00', '2010-03-20 00:00:00', 60),

-- |----------|                                70
--   |------|                                  80
('2010-04-01 00:00:00', '2010-04-20 00:00:00', 70),
('2010-04-05 00:00:00', '2010-04-15 00:00:00', 80),

-- |-----------------------------|             29.99
--      |---------|                            32.99
--                |---------|                  32.99
--                               |----------|  34.99
('2014-08-06 00:00:00', '2014-10-07 00:00:00', 29.99),
('2014-09-06 00:00:00', '2014-09-10 00:00:00', 32.99),
('2014-09-10 00:00:00', '2014-10-01 00:00:00', 32.99),
('2014-10-07 00:00:00', '2050-01-01 00:00:00', 34.99);

Query

WITH
CTE_Boundaries
AS
(
    SELECT
        START_DATE AS dt
        ,+1 AS Flag
        ,GROSS_SALES_PRICE AS Price
    FROM @T

    UNION ALL

    SELECT
        END_DATE AS dt
        ,-1 AS Flag
        ,GROSS_SALES_PRICE AS Price
    FROM @T
)
,CTE_Intervals
AS
(
    SELECT
        dt
        ,Flag
        ,Price
        ,SUM(Flag) OVER (ORDER BY dt, Flag ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW) AS SumFlag
        ,LEAD(dt) OVER (ORDER BY dt, Flag) AS NextDate
        ,LEAD(Price) OVER (ORDER BY dt, Flag) AS NextPrice
    FROM CTE_Boundaries
)
SELECT
    dt AS StartDate
    ,NextDate AS EndDate
    ,CASE WHEN Flag = 1 THEN Price ELSE NextPrice END AS Price
FROM CTE_Intervals
WHERE
    SumFlag > 0
    AND dt <> NextDate
ORDER BY StartDate
;

Result

+---------------------+---------------------+-------+
|      StartDate      |       EndDate       | Price |
+---------------------+---------------------+-------+
| 2001-01-01 00:00:00 | 2001-01-10 00:00:00 | 11.00 |
| 2010-01-01 00:00:00 | 2010-01-05 00:00:00 | 10.00 |
| 2010-01-05 00:00:00 | 2010-01-10 00:00:00 | 20.00 |
| 2010-01-10 00:00:00 | 2010-01-20 00:00:00 | 20.00 |
| 2010-02-01 00:00:00 | 2010-02-05 00:00:00 | 30.00 |
| 2010-02-05 00:00:00 | 2010-02-20 00:00:00 | 40.00 |
| 2010-03-01 00:00:00 | 2010-03-20 00:00:00 | 60.00 |
| 2010-04-01 00:00:00 | 2010-04-05 00:00:00 | 70.00 |
| 2010-04-05 00:00:00 | 2010-04-15 00:00:00 | 80.00 |
| 2010-04-15 00:00:00 | 2010-04-20 00:00:00 | 70.00 |

this is your sample data:

| 2014-08-06 00:00:00 | 2014-09-06 00:00:00 | 29.99 |
| 2014-09-06 00:00:00 | 2014-09-10 00:00:00 | 32.99 |
| 2014-09-10 00:00:00 | 2014-10-01 00:00:00 | 32.99 |
| 2014-10-01 00:00:00 | 2014-10-07 00:00:00 | 29.99 |
| 2014-10-07 00:00:00 | 2050-01-01 00:00:00 | 34.99 |
+---------------------+---------------------+-------+

Intermediary result of CTE_Intervals

Examine these to understand how the query works

+---------------------+------+-------+---------+---------------------+-----------+
|         dt          | Flag | Price | SumFlag |      NextDate       | NextPrice |
+---------------------+------+-------+---------+---------------------+-----------+
| 2001-01-01 00:00:00 |    1 | 11.00 |       1 | 2001-01-10 00:00:00 | 11.00     |
| 2001-01-10 00:00:00 |   -1 | 11.00 |       0 | 2010-01-01 00:00:00 | 10.00     |
| 2010-01-01 00:00:00 |    1 | 10.00 |       1 | 2010-01-05 00:00:00 | 20.00     |
| 2010-01-05 00:00:00 |    1 | 20.00 |       2 | 2010-01-10 00:00:00 | 10.00     |
| 2010-01-10 00:00:00 |   -1 | 10.00 |       1 | 2010-01-20 00:00:00 | 20.00     |
| 2010-01-20 00:00:00 |   -1 | 20.00 |       0 | 2010-02-01 00:00:00 | 30.00     |
| 2010-02-01 00:00:00 |    1 | 30.00 |       1 | 2010-02-05 00:00:00 | 40.00     |
| 2010-02-05 00:00:00 |    1 | 40.00 |       2 | 2010-02-20 00:00:00 | 30.00     |
| 2010-02-20 00:00:00 |   -1 | 30.00 |       1 | 2010-02-20 00:00:00 | 40.00     |
| 2010-02-20 00:00:00 |   -1 | 40.00 |       0 | 2010-03-01 00:00:00 | 50.00     |
| 2010-03-01 00:00:00 |    1 | 50.00 |       1 | 2010-03-01 00:00:00 | 60.00     |
| 2010-03-01 00:00:00 |    1 | 60.00 |       2 | 2010-03-20 00:00:00 | 50.00     |
| 2010-03-20 00:00:00 |   -1 | 50.00 |       1 | 2010-03-20 00:00:00 | 60.00     |
| 2010-03-20 00:00:00 |   -1 | 60.00 |       0 | 2010-04-01 00:00:00 | 70.00     |
| 2010-04-01 00:00:00 |    1 | 70.00 |       1 | 2010-04-05 00:00:00 | 80.00     |
| 2010-04-05 00:00:00 |    1 | 80.00 |       2 | 2010-04-15 00:00:00 | 80.00     |
| 2010-04-15 00:00:00 |   -1 | 80.00 |       1 | 2010-04-20 00:00:00 | 70.00     |
| 2010-04-20 00:00:00 |   -1 | 70.00 |       0 | 2014-08-06 00:00:00 | 29.99     |
| 2014-08-06 00:00:00 |    1 | 29.99 |       1 | 2014-09-06 00:00:00 | 32.99     |
| 2014-09-06 00:00:00 |    1 | 32.99 |       2 | 2014-09-10 00:00:00 | 32.99     |
| 2014-09-10 00:00:00 |   -1 | 32.99 |       1 | 2014-09-10 00:00:00 | 32.99     |
| 2014-09-10 00:00:00 |    1 | 32.99 |       2 | 2014-10-01 00:00:00 | 32.99     |
| 2014-10-01 00:00:00 |   -1 | 32.99 |       1 | 2014-10-07 00:00:00 | 29.99     |
| 2014-10-07 00:00:00 |   -1 | 29.99 |       0 | 2014-10-07 00:00:00 | 34.99     |
| 2014-10-07 00:00:00 |    1 | 34.99 |       1 | 2050-01-01 00:00:00 | 34.99     |
| 2050-01-01 00:00:00 |   -1 | 34.99 |       0 | NULL                | NULL      |
+---------------------+------+-------+---------+---------------------+-----------+
like image 1
Vladimir Baranov Avatar answered Oct 26 '22 13:10

Vladimir Baranov