Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Get distinct consecutive date ranges from overlapping date ranges

I need to get a list of date ranges that are NOT overlapping with each other from a list of overlapping dates and get the sum of coins during that overlap. I have tried googling for an example but no luck so far. I might not be using the right key words?

I have a list of overlapping dates

1.1.2018 - 31.1.2018 80
7.1.2018 - 10.1.2018 10
7.1.2018 - 31.1.2018 10
11.1.2018 - 31.1.2018 5
25.1.2018 - 27.1.2018 5
2.2.2018 - 23.2.2018 100

Desired outcome would be

1.1.2018 - 6.7.2018 80 coins
7.1.2018 - 10.1.2018 100 coins
11.1.2018 - 24.1.2018 95 coins
25.1.2018 - 27.1.2018 100 coins
28.1.2018 - 31.1.2018 95 coins
2.2.2018 - 23.2.2018 100 coins

Here is a figure how it should work

|------------------------------|
       |---|
       |-----------------------|
           |-------------------|
                      |---|
                                   |----------------------|
Outcome              
|------|---|----------|---|----|   |----------------------|
   80   100     95     100  95                100

This is my test data

drop table coinsonperiod2;
create table coinsonperiod2(
  id serial,
  startdate date,
  enddate date,
  coins integer,
  userid integer
);
insert into coinsonperiod2 (startdate, enddate, coins,userid) values
  ('2018-01-01','2018-01-31', 80,1)
, ('2018-01-07','2018-01-10', 10,1)
, ('2018-01-07','2018-01-31', 10,1)
, ('2018-01-11','2018-01-31', 5,1)
, ('2018-01-25','2018-01-27', 5,1)
, ('2018-02-02','2018-02-23', 100,2)
, ('2018-01-01','2018-01-31', 80,2)
, ('2018-01-07','2018-01-10', 10,2)
, ('2018-01-07','2018-01-31', 10,2)
, ('2018-01-11','2018-01-31', 5,2)
, ('2018-01-25','2018-01-27', 5,2)
, ('2018-02-02','2018-02-23', 100,3)
; 

UPDATE: Actually StephenM's and joops answers do not meet my desired outcome. Both answers show enddate wrong.

When one period ends the next should start next day (or later if there is a gap). In my desired outcome 1.1.2018-6.1.2018 includes the 6th day. There is no gap between 6th and 7th because 7th is included in 7.1.2018-10.1.2018.

UPDATE2: Now I understood what is the difference between open, half open and closed intervals. In joops solution, calculation must be done against half open intervals, but my desired outcome is closed interval. That is why enddate must be reduced to make the outcome as closed interval. Correct me if I am wrong.

I also added userid in the sample data and modified joops solution some more. Here is the query that gives me my desired outcome.

with changes AS (
  SELECT
    userid,
    startdate AS tickdate,
    coins,
    1         AS cover
  FROM coinsonperiod2
  UNION ALL
  -- add 1 day to correct intervals into half open intervals, so the calculation is correct
  SELECT
    userid,
    1 + enddate AS tickdate,
    -1 * coins,
    -1          AS cover
  FROM coinsonperiod2
)
, sumchanges  AS (
    SELECT
      userid,
      tickdate,
      SUM(coins) AS change,
      SUM(cover) AS cover
    FROM changes
    GROUP BY tickdate, userid
)
, aggregated AS (
    SELECT
      userid   AS userid,
      tickdate AS startdate,
      lead(tickdate)
      over www AS enddate,
      sum(change)
      OVER www AS cash,
      sum(cover)
      OVER www AS cover
    FROM sumchanges
    WINDOW www AS (
      partition by userid
      ORDER BY tickdate )
)
-- reduce 1 day from the enddate to make closed interval
SELECT
userid
, startdate
, enddate-1 as enddate
, cash
, cover
FROM aggregated
WHERE cover > 0
ORDER BY userid, startdate
;

Outcome: Outcome

like image 889
Matti Ilvonen Avatar asked Oct 16 '18 07:10

Matti Ilvonen


2 Answers

The logic is:

  • at the beginning of an interval add its value to a cumulative sum
  • at the end of an interval substract its value from this sum
  • but in order to sweep the dateline, we'll have to collect al the (unique) date/time stamps, either start or stop.

So the point is: convert the data from a series of intervals to a series of (start/stop) events, and aggregate over these.


-- \i tmp.sql

create table coinsonperiod(
  id serial,
  startdate date,
  enddate date,
  coins integer
);
insert into coinsonperiod (startdate, enddate, coins) values
  ('2018-01-01','2018-01-31', 80)
, ('2018-01-07','2018-01-10', 10)
, ('2018-01-07','2018-01-31', 10)
, ('2018-01-11','2018-01-31', 5)
, ('2018-01-25','2018-01-27', 5)
, ('2018-02-02','2018-02-23', 100)
        ;

WITH changes AS (
    SELECT startdate AS tickdate , coins
            , 1 AS cover
    FROM coinsonperiod
    UNION ALL
    -- add 1 day to convert to half-open intervals
    SELECT 1+enddate AS tickdate, -1* coins
            , -1 AS cover
    FROM coinsonperiod
    )
, sumchanges  AS (
        SELECT tickdate, SUM(coins) AS change, SUM(cover) AS cover
        FROM changes
        GROUP BY tickdate
        )
, aggregated AS (
        SELECT
        tickdate AS startdate
        , lead(tickdate) over www AS enddate
        , sum(change) OVER www AS cash
          -- number of covered intervals
        , sum(cover) OVER www AS cover
        FROM sumchanges
        WINDOW www AS (ORDER BY tickdate)
        )
             -- substract one day from enddate to correct back to closed intervals
SELECT startdate, enddate-1 AS enddate, cash, cover
FROM aggregated
WHERE cover > 0
ORDER BY startdate
        ;
like image 134
joop Avatar answered Nov 05 '22 14:11

joop


Looks like I found an ugly one that works

select t1.dt, t1.enddt, sum(coins)
from (
    select distinct cp1.dt, min(cp2.dt) enddt
    from ( select startdate as dt from coinsonperiod union all select enddate as dt from coinsonperiod ) cp1, 
         ( select startdate as dt from coinsonperiod union all select enddate as dt from coinsonperiod ) cp2
    where cp2.dt > cp1.dt
    group by cp1.dt
    order by cp1.dt ) t1, coinsonperiod t2
where t1.dt between t2.startdate and t2.enddate
and t1.enddt between t2.startdate and t2.enddate
group by t1.dt, t1.enddt

Output:

dt         |enddt      |sum |
-----------|-----------|----|
2018-01-01 |2018-01-07 |80  |
2018-01-07 |2018-01-10 |100 |
2018-01-10 |2018-01-11 |90  |
2018-01-11 |2018-01-25 |95  |
2018-01-25 |2018-01-27 |100 |
2018-01-27 |2018-01-31 |95  |
2018-02-02 |2018-02-23 |100 |

Only difference with your output is that I suppose you forgot the interval between 01/10 and 01/11

like image 35
StephaneM Avatar answered Nov 05 '22 12:11

StephaneM