Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

SQL grouping by datetime with a maximum difference of x minutes

I have a problem with grouping my dataset in MS SQL Server.

My table looks like

 # | CustomerID | SalesDate           | Turnover
---| ---------- | ------------------- | ---------
 1 | 1          | 2016-08-09 12:15:00 |  22.50
 2 | 1          | 2016-08-09 12:17:00 |  10.00
 3 | 1          | 2016-08-09 12:58:00 |  12.00
 4 | 1          | 2016-08-09 13:01:00 |  55.00
 5 | 1          | 2016-08-09 23:59:00 |  10.00
 6 | 1          | 2016-08-10 00:02:00 |   5.00

Now I want to group the rows where the SalesDate difference to the next row is of a maximum of 5 minutes. So that row 1 & 2, 3 & 4 and 5 & 6 are each one group.

My approach was getting the minutes with the DATEPART() function and divide the result by 5:

(DATEPART(MINUTE, SalesDate) / 5)

For row 1 and 2 the result would be 3 and grouping here would work perfectly. But for the other rows where there is a change in the hour or even in the day part of the SalesDate, the result cannot be used for grouping.

So this is where I'm stuck. I would really appreciate, if someone could point me in the right direction.

like image 209
Felix Avatar asked Aug 09 '16 13:08

Felix


2 Answers

You want to group adjacent transactions based on the timing between them. The idea is to assign some sort of grouping identifier, and then use that for aggregation.

Here is an approach:

  • Identify group starts using lag() and date arithmetic.
  • Do a cumulative sum of the group starts to identify each group.
  • Aggregate

The query looks like this:

select customerid, min(salesdate), max(saledate), sum(turnover)
from (select t.*,
             sum(case when salesdate > dateadd(minute, 5, prev_salesdate)
                      then 1 else 0
                 end) over (partition by customerid order by salesdate) as grp
      from (select t.*,
                   lag(salesdate) over (partition by customerid order by salesdate) as prev_salesdate
            from t
           ) t
     ) t
group by customerid, grp;
like image 191
Gordon Linoff Avatar answered Oct 09 '22 18:10

Gordon Linoff


EDIT

Thanks to @JoeFarrell for pointing out I have answered the wrong question. The OP is looking for dynamic time differences between rows, but this approach creates fixed boundaries.

Original Answer

You could create a time table. This is a table that contains one record for each second of the day. Your table would have a second column that you can use to perform group bys on.

CREATE TABLE [Time]
    (
        TimeId      TIME(0) PRIMARY KEY,
        TimeGroup   TIME
    )
;

-- You could use a loop here instead.
INSERT INTO [Time]
    (
        TimeId,
        TimeGroup
    )
VALUES
    ('00:00:00', '00:00:00'),    -- First group starts here.
    ('00:00:01', '00:00:00'),
    ('00:00:02', '00:00:00'),
    ('00:00:03', '00:00:00'),
    ...
    ('00:04:59', '00:00:00'),
    ('00:05:00', '00:05:00'),    -- Second group starts here.
    ('00:05:01', '00:05:00')
;

The approach works best when:

  1. You need to reuse your custom grouping in several different queries.
  2. You have two or more custom groups you often use.

Once populated you can simply join to the table and output the desired result.

/* Using the time table.
 */
SELECT
    t.TimeGroup,
    SUM(Turnover) AS SumOfTurnover
FROM
    Sales AS s
        INNER JOIN [Time] AS t      ON t.TimeId = CAST(s.SalesDate AS Time(0))
GROUP BY
    t.TimeGroup
;
like image 39
David Rushton Avatar answered Oct 09 '22 19:10

David Rushton