Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Creating groups of consecutive days meeting a given criteria

I have table the following data structure in SQL Server:

ID  Date        Allocation
 1, 2012-01-01, 0
 2, 2012-01-02, 2
 3, 2012-01-03, 0
 4, 2012-01-04, 0
 5, 2012-01-05, 0
 6, 2012-01-06, 5

etc.

What I need to do is get all consecutive day periods where Allocation = 0, and in the following form:

Start Date    End Date     DayCount
2012-01-01    2012-01-01   1
2012-01-03    2012-01-05   3

etc.

Is it possible to do this in SQL, and if so how?

like image 360
Istari Avatar asked Feb 14 '12 10:02

Istari


2 Answers

In this answer, I'll assume that the "id" field numbers the rows consecutively when sorted by increasing date, like it does in the example data. (Such a column can be created if it does not exist).

This is an example of a technique described here and here.

1) Join the table to itself on adjacent "id" values. This pairs adjacent rows. Select rows where the "allocation" field has changed. Store the result in a temporary table, also keeping a running index.

SET @idx = 0;
CREATE TEMPORARY TABLE boundaries
SELECT
   (@idx := @idx + 1) AS idx,
   a1.date AS prev_end,
   a2.date AS next_start,
   a1.allocation as allocation
FROM allocations a1
JOIN allocations a2
ON (a2.id = a1.id + 1)
WHERE a1.allocation != a2.allocation;

This gives you a table having "the end of the previous period", "the start of the next period", and "the value of 'allocation' in the previous period" in each row:

+------+------------+------------+------------+
| idx  | prev_end   | next_start | allocation |
+------+------------+------------+------------+
|    1 | 2012-01-01 | 2012-01-02 |          0 |
|    2 | 2012-01-02 | 2012-01-03 |          2 |
|    3 | 2012-01-05 | 2012-01-06 |          0 |
+------+------------+------------+------------+

2) We need the start and end of each period in the same row, so we need to combine adjacent rows again. Do this by creating a second temporary table like boundaries but having an idx field 1 greater:

+------+------------+------------+
| idx  | prev_end   | next_start |
+------+------------+------------+
|    2 | 2012-01-01 | 2012-01-02 |
|    3 | 2012-01-02 | 2012-01-03 |
|    4 | 2012-01-05 | 2012-01-06 |
+------+------------+------------+

Now join on the idx field and we get the answer:

SELECT
  boundaries2.next_start AS start,
  boundaries.prev_end AS end,
  allocation
FROM boundaries
JOIN boundaries2
USING(idx);

+------------+------------+------------+
| start      | end        | allocation |
+------------+------------+------------+
| 2012-01-02 | 2012-01-02 |          2 |
| 2012-01-03 | 2012-01-05 |          0 |
+------------+------------+------------+

** Note that this answer gets the "internal" periods correctly but misses the two "edge" periods where allocation = 0 at the beginning and allocation = 5 at the end. Those can be pulled in using UNION clauses but I wanted to present the core idea without that complication.

like image 76
gcbenison Avatar answered Nov 06 '22 10:11

gcbenison


Following would be one way to do it. The gist of this solution is

  • Use a CTE to get a list of all consecutive start and enddates with Allocation = 0
  • Use the ROW_NUMBER window function to assign rownumbers depending on both start- and enddates.
  • Select only those records where both ROW_NUMBERS equal 1.
  • Use DATEDIFFto calculate the DayCount

SQL Statement

;WITH r AS (
  SELECT  StartDate = Date, EndDate = Date
  FROM    YourTable
  WHERE   Allocation = 0
  UNION ALL
  SELECT  r.StartDate, q.Date
  FROM    r
          INNER JOIN YourTable q ON DATEDIFF(dd, r.EndDate, q.Date) = 1
  WHERE   q.Allocation = 0          
)
SELECT  [Start Date] = s.StartDate
        , [End Date ] = s.EndDate
        , [DayCount] = DATEDIFF(dd, s.StartDate, s.EndDate) + 1
FROM    (
          SELECT  *
                  , rn1 = ROW_NUMBER() OVER (PARTITION BY StartDate ORDER BY EndDate DESC)
                  , rn2 = ROW_NUMBER() OVER (PARTITION BY EndDate ORDER BY StartDate ASC)
          FROM    r          
        ) s
WHERE   s.rn1 = 1
        AND s.rn2 = 1
OPTION  (MAXRECURSION 0)

Test script

;WITH q (ID, Date, Allocation) AS (
  SELECT * FROM (VALUES
    (1, '2012-01-01', 0)
    , (2, '2012-01-02', 2)
    , (3, '2012-01-03', 0)
    , (4, '2012-01-04', 0)
    , (5, '2012-01-05', 0)
    , (6, '2012-01-06', 5)
  ) a (a, b, c)
)
, r AS (
  SELECT  StartDate = Date, EndDate = Date
  FROM    q
  WHERE   Allocation = 0
  UNION ALL
  SELECT  r.StartDate, q.Date
  FROM    r
          INNER JOIN q ON DATEDIFF(dd, r.EndDate, q.Date) = 1
  WHERE   q.Allocation = 0          
)
SELECT  s.StartDate, s.EndDate, DATEDIFF(dd, s.StartDate, s.EndDate) + 1
FROM    (
          SELECT  *
                  , rn1 = ROW_NUMBER() OVER (PARTITION BY StartDate ORDER BY EndDate DESC)
                  , rn2 = ROW_NUMBER() OVER (PARTITION BY EndDate ORDER BY StartDate ASC)
          FROM    r          
        ) s
WHERE   s.rn1 = 1
        AND s.rn2 = 1
OPTION  (MAXRECURSION 0)
like image 33
Lieven Keersmaekers Avatar answered Nov 06 '22 12:11

Lieven Keersmaekers