Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to aggregate (counting distinct items) over a sliding window in SQL Server?

I am currently using this query (in SQL Server) to count the number of unique item each day:

SELECT Date, COUNT(DISTINCT item) 
FROM myTable 
GROUP BY Date 
ORDER BY Date

How can I transform this to get for each date the number of unique item over the past 3 days (including the current day)?

The output should be a table with 2 columns: one columns with all dates in the original table. On the second column, we have the number of unique item per date.

for instance if original table is:

Date        Item  
01/01/2018  A  
01/01/2018  B  
02/01/2018  C  
03/01/2018  C    
04/01/2018  C

With my query above I currently get the unique count for each day:

Date        count  
01/01/2018  2  
02/01/2018  1  
03/01/2018  1  
04/01/2018  1

and I am looking to get as result the unique count over 3 days rolling window:

Date        count  
01/01/2018  2  
02/01/2018  3  (because items ABC on 1st and 2nd Jan)
03/01/2018  3  (because items ABC on 1st,2nd,3rd Jan)    
04/01/2018  1  (because only item C on 2nd,3rd,4th Jan)    
like image 606
RockScience Avatar asked Jan 15 '18 02:01

RockScience


1 Answers

Using an apply provides a convenient way to form sliding windows

CREATE TABLE myTable 
    ([DateCol] datetime, [Item] varchar(1))
;

INSERT INTO myTable 
    ([DateCol], [Item])
VALUES
    ('2018-01-01 00:00:00', 'A'),
    ('2018-01-01 00:00:00', 'B'),
    ('2018-01-02 00:00:00', 'C'),
    ('2018-01-03 00:00:00', 'C'),
    ('2018-01-04 00:00:00', 'C')
;

CREATE NONCLUSTERED INDEX IX_DateCol  
    ON MyTable([Date])  
;    

Query:

select distinct 
       t1.dateCol
     , oa.ItemCount
from myTable t1
outer apply (
      select count(distinct t2.item) as ItemCount
      from myTable t2
      where t2.DateCol between dateadd(day,-2,t1.DateCol) and t1.DateCol
  ) oa
order by t1.dateCol ASC

Results:

|              dateCol | ItemCount |
|----------------------|-----------|
| 2018-01-01T00:00:00Z |         2 |
| 2018-01-02T00:00:00Z |         3 |
| 2018-01-03T00:00:00Z |         3 |
| 2018-01-04T00:00:00Z |         1 |

There may be some performance gains by reducing the date column prior to using the apply, like so:

select 
       d.date
     , oa.ItemCount
from (
    select distinct t1.date
    from myTable t1
     ) d
outer apply (
      select count(distinct t2.item) as ItemCount
      from myTable t2
      where t2.Date between dateadd(day,-2,d.Date) and d.Date
  ) oa
order by d.date ASC
;

Instead of using select distinct in that subquery you could use group by instead but the execution plan will remain the same.

Demo at SQL Fiddle

like image 173
Paul Maxwell Avatar answered Oct 08 '22 20:10

Paul Maxwell