Im trying to write a function which identifies groups of dates, and measures the size of the group. I've been doing this procedurally in Python until now but I'd like to move it into SQL. for example, the list <pre class="prettyprint"><code>Bill 01/01/2011 Bill 02/01/2011 Bill 03/01/2011 Bill 05/01/2011 Bill 07/01/2011 </code></pre> should be output into a new table as: <pre class="prettyprint"><code>Bill 01/01/2011 3 Bill 02/01/2011 3 Bill 03/01/2011 3 Bill 05/01/2011 1 Bill 07/01/2011 1 </code></pre> Ideally this should also be able to account for weekends and public holidays - the dates in my table will aways be Mon-Fri (I think I can solve this by making a new table of working days and numbering them in sequence). Someone at work suggested I try a CTE. Im pretty new to this, so I'd appreciate any guidance anyone could provide! Thanks.

You can do this with a clever application of window functions. Consider the following: <pre class="prettyprint"><code>select name, date, row_number() over (partition by name order by date) from t </code></pre> This adds a row number, which in your example would simply be 1, 2, 3, 4, 5. Now, take the difference from the date, and you have a constant value for the group. <pre class="prettyprint"><code>select name, date, dateadd(d, - row_number() over (partition by name order by date), date) as val from t </code></pre> Finally, you want the number of groups in sequence. I would also add a group identifier (for instance, to distinguish between the last two). <pre class="prettyprint"><code>select name, date, count(*) over (partition by name, val) as NumInSeq, dense_rank() over (partition by name order by val) as SeqID from (select name, date, dateadd(d, - row_number() over (partition by name order by date), date) as val from t ) t </code></pre> Somehow, I missed the part about weekdays and holidays. This solution does not solve that problem.

How can I identify groups of consecutive dates in SQL?

Tags:

date

sql

gaps-and-islands

Im trying to write a function which identifies groups of dates, and measures the size of the group.

I've been doing this procedurally in Python until now but I'd like to move it into SQL.

for example, the list

Bill 01/01/2011 
Bill 02/01/2011 
Bill 03/01/2011 
Bill 05/01/2011 
Bill 07/01/2011

should be output into a new table as:

Bill 01/01/2011  3 
Bill 02/01/2011  3 
Bill 03/01/2011  3 
Bill 05/01/2011  1 
Bill 07/01/2011  1

Ideally this should also be able to account for weekends and public holidays - the dates in my table will aways be Mon-Fri (I think I can solve this by making a new table of working days and numbering them in sequence). Someone at work suggested I try a CTE. Im pretty new to this, so I'd appreciate any guidance anyone could provide! Thanks.

896

asked Jul 10 '12 14:07

Pythonn00b

2 Answers

You can do this with a clever application of window functions. Consider the following:

select name, date, row_number() over (partition by name order by date)
from t

This adds a row number, which in your example would simply be 1, 2, 3, 4, 5. Now, take the difference from the date, and you have a constant value for the group.

select name, date,
       dateadd(d, - row_number() over (partition by name order by date), date) as val
from t

Finally, you want the number of groups in sequence. I would also add a group identifier (for instance, to distinguish between the last two).

select name, date,
       count(*) over (partition by name, val) as NumInSeq,
       dense_rank() over (partition by name order by val) as SeqID
from (select name, date,
             dateadd(d, - row_number() over (partition by name order by date), date) as val
      from t
     ) t

Somehow, I missed the part about weekdays and holidays. This solution does not solve that problem.

answered Sep 28 '22 01:09

Gordon Linoff

The following query account the weekends and holidays. The query has a provision to include the holidays on-the-fly, though for the purpose of making the query clearer, I just materialized the holidays to an actual table.

CREATE TABLE tx
    (n varchar(4), d date);

INSERT INTO tx
    (n, d)
VALUES
    ('Bill', '2006-12-29'), -- Friday    
    -- 2006-12-30 is Saturday
    -- 2006-12-31 is Sunday

    -- 2007-01-01 is New Year's Holiday    
    ('Bill', '2007-01-02'), -- Tuesday
    ('Bill', '2007-01-03'), -- Wednesday
    ('Bill', '2007-01-04'), -- Thursday
    ('Bill', '2007-01-05'), -- Friday    
    -- 2007-01-06 is Saturday
    -- 2007-01-07 is Sunday

    ('Bill', '2007-01-08'), -- Monday
    ('Bill', '2007-01-09'), -- Tuesday

    ('Bill', '2012-07-09'), -- Monday
    ('Bill', '2012-07-10'), -- Tuesday
    ('Bill', '2012-07-11'); -- Wednesday

create table holiday(d date);
insert into holiday(d) values
('2007-01-01');


/* query should return 7 consecutive good 
   attendance(from December 29 2006 to January 9 2007) */    
/* and 3 consecutive attendance from July 7 2012 to July 11 2012. */

Query:

with first_date as
(
    -- get the monday of the earliest date
    select dateadd( ww, datediff(ww,0,min(d)), 0 ) as first_date 
    from tx 
)
,shifted as
(
    select 
        tx.n, tx.d,                     
        diff = datediff(day, fd.first_date, tx.d) 
                   - (datediff(day, fd.first_date, tx.d)/7 * 2)             
    from tx
    cross join first_date fd
    union
    select 
        xxx.n, h.d,             
        diff = datediff(day, fd.first_date, h.d) 
                   - (datediff(day, fd.first_date, h.d)/7 * 2) 
    from holiday h 
    cross join first_date fd
    cross join (select distinct n from tx) as xxx
)
,grouped as
(
    select *, grp = diff - row_number() over(partition by n order by d)
    from shifted
)
select 
    d, n, dense_rank() over (partition by n order by grp) as nth_streak
    ,count(*) over (partition by n, grp) as streak
from grouped
where d not in (select d from holiday)  -- remove the holidays

Output:

|          D |    N | NTH_STREAK | STREAK |
-------------------------------------------
| 2006-12-29 | Bill |          1 |      7 |
| 2007-01-02 | Bill |          1 |      7 |
| 2007-01-03 | Bill |          1 |      7 |
| 2007-01-04 | Bill |          1 |      7 |
| 2007-01-05 | Bill |          1 |      7 |
| 2007-01-08 | Bill |          1 |      7 |
| 2007-01-09 | Bill |          1 |      7 |
| 2012-07-09 | Bill |          2 |      3 |
| 2012-07-10 | Bill |          2 |      3 |
| 2012-07-11 | Bill |          2 |      3 |

Live test: http://www.sqlfiddle.com/#!3/815c5/1

The main logic of the query is to shift all the dates two days back. This is done by dividing the date to 7 and multiplying it by two, then subtracting it from the original number. For example, if a given date falls on 15th, this will be computed as 15/7 * 2 == 4; then subtract 4 from the original number, 15 - 4 == 11. 15 will become the 11th day. Likewise the 8th day becomes the 6th day; 8 - (8/7 * 2) == 6.

                  Weekends are not in attendance(e.g. 6,7,13,14)
 1  2  3  4  5     6  7
 8  9 10 11 12    13 14
15

Applying the computation to all the weekday numbers will yield these values:

 1  2  3  4  5    
 6  7  8  9 10    
11

For holidays, you need to slot them on attendance, so to the consecutive-ness could be easily determined, then just remove them from the final query. The above attendance yields 11 consecutive good attendance.

Query logic's detailed explanation here: http://www.ienablemuch.com/2012/07/monitoring-perfect-attendance.html

answered Sep 28 '22 02:09

Michael Buen

Related questions
                            
                                SQL LIKE wildcard space character
                            
                                SQL Group By - counting records per month/year, error on insert - NOT A VALID MONTH
                            
                                syntax error with FOREIGN KEY in CREATE TABLE
                            
                                SQL Prime number function
                            
                                psql shortcut for frequently used queries? (like Unix "alias")
                            
                                Count boolean field values within a single query
                            
                                Unexpected EOF encountered in BCP
                            
                                Giving alias name with space in sql
                            
                                Using IF EXISTS with a CTE
                            
                                ERROR 1833 (HY000): Cannot change column MySQL
                            
                                How to return a page of results from SQL?
                            
                                Converting a String to HEX in SQL
                            
                                Using an Alias column in the where clause in ms-sql 2000
                            
                                SQL Statement Help - Select latest Order for each Customer
                            
                                Unexpected table scan for id != id
                            
                                How to iterate by row through a mysql query in php
                            
                                SQL Server - lack of NATURAL JOIN / x JOIN y USING(field)
                            
                                TSQL strip date from datetime
                            
                                Aggregate function checking 'if contains'
                            
                                Flattening of a 1 row table into a key-value pair table

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With