I am using SQL Server 2008 and need to create a query that shows rows that fall within a date range.
My table is as follows:
ADM_ID    WH_PID     WH_IN_DATETIME    WH_OUT_DATETIME
My rules are:
I would like another column added to the results which identify the grouped value if possible as EP_ID.
e.g.
ADM_ID    WH_PID    WH_IN_DATETIME         WH_OUT_DATETIME
------    ------    --------------         ---------------
1         9         2014-10-12 00:00:00    2014-10-13 15:00:00
2         9         2014-10-14 14:00:00    2014-10-15 15:00:00
3         9         2014-10-16 14:00:00    2014-10-17 15:00:00
4         9         2014-11-20 00:00:00    2014-11-21 00:00:00
5         5         2014-10-17 00:00:00    2014-10-18 00:00:00
Would return rows with:
ADM_ID   WH_PID   EP_ID   EP_IN_DATETIME        EP_OUT_DATETIME       WH_IN_DATETIME        WH_OUT_DATETIME
------   ------   -----   -------------------   -------------------   -------------------   -------------------
1        9        1       2014-10-12 00:00:00   2014-10-17 15:00:00   2014-10-12 00:00:00   2014-10-13 15:00:00
2        9        1       2014-10-12 00:00:00   2014-10-17 15:00:00   2014-10-14 14:00:00   2014-10-15 15:00:00
3        9        1       2014-10-12 00:00:00   2014-10-17 15:00:00   2014-10-16 14:00:00   2014-10-17 15:00:00
4        9        2       2014-11-20 00:00:00   2014-11-20 00:00:00   2014-10-16 14:00:00   2014-11-21 00:00:00
5        5        1       2014-10-17 00:00:00   2014-10-18 00:00:00   2014-10-17 00:00:00   2014-10-18 00:00:00
The EP_OUT_DATETIME will always be the latest date in the group. Hope this clarifies a bit. This way, I can group by the EP_ID and find the EP_OUT_DATETIME and start time for any ADM_ID/PID that fall within.
Each should roll into the next, meaning that if another row has an WH_IN_DATETIME which follows on the WH_OUT_DATETIME of another for the same WH_PID, than that row's WH_OUT_DATETIME becomes the EP_OUT_DATETIME for all of the WH_PID's within that EP_ID.
I hope this makes some sense.
Thanks, MR
Since the question does not specify that the solution be a "single" query ;-), here is another approach: using the "quirky update" feature dealy, which is updating a variable at the same time you update a column. Breaking down the complexity of this operation, I create a scratch table to hold the piece that is the hardest to calculate: the EP_ID. Once that is done, it gets joined into a simple query and provides the window with which to calculate the EP_IN_DATETIME and EP_OUT_DATETIME fields.
The steps are:
ADM_ID values -- this lets us do an UPDATE as all of the rows already exist.The Test Setup
SET ANSI_NULLS ON;
SET NOCOUNT ON;
CREATE TABLE #Table
(
   ADM_ID INT NOT NULL PRIMARY KEY,
   WH_PID INT NOT NULL,
   WH_IN_DATETIME DATETIME NOT NULL,
   WH_OUT_DATETIME DATETIME NOT NULL
);
INSERT INTO #Table VALUES (1, 9, '2014-10-12 00:00:00', '2014-10-13 15:00:00');
INSERT INTO #Table VALUES (2, 9, '2014-10-14 14:00:00', '2014-10-15 15:00:00');
INSERT INTO #Table VALUES (3, 9, '2014-10-16 14:00:00', '2014-10-17 15:00:00');
INSERT INTO #Table VALUES (4, 9, '2014-11-20 00:00:00', '2014-11-21 00:00:00');
INSERT INTO #Table VALUES (5, 5, '2014-10-17 00:00:00', '2014-10-18 00:00:00');
Step 1: Create and Populate the Scratch Table
CREATE TABLE #Scratch
(
   ADM_ID INT NOT NULL PRIMARY KEY,
   EP_ID INT NOT NULL
   -- Might need WH_PID and WH_IN_DATETIME fields to guarantee proper UPDATE ordering
);
INSERT INTO #Scratch (ADM_ID, EP_ID)
   SELECT ADM_ID, 0
   FROM   #Table;
Alternate scratch table structure to ensure proper update order (since "quirky update" uses the order of the Clustered Index, as noted at the bottom of this answer):
CREATE TABLE #Scratch
(
   WH_PID INT NOT NULL,
   WH_IN_DATETIME DATETIME NOT NULL,
   ADM_ID INT NOT NULL,
   EP_ID INT NOT NULL
);
INSERT INTO #Scratch (WH_PID, WH_IN_DATETIME, ADM_ID, EP_ID)
   SELECT WH_PID, WH_IN_DATETIME, ADM_ID, 0
   FROM   #Table;
CREATE UNIQUE CLUSTERED INDEX [CIX_Scratch]
   ON #Scratch (WH_PID, WH_IN_DATETIME, ADM_ID);
Step 2: Update the Scratch Table using a local variable to keep track of the prior value
DECLARE @EP_ID INT; -- this is used in the UPDATE
;WITH cte AS
(
  SELECT TOP (100) PERCENT
         t1.*,
         t2.WH_OUT_DATETIME AS [PriorOut],
         t2.ADM_ID AS [PriorID],
         ROW_NUMBER() OVER (PARTITION BY t1.WH_PID ORDER BY t1.WH_IN_DATETIME)
                AS [RowNum]
  FROM   #Table t1
  LEFT JOIN #Table t2
         ON t2.WH_PID = t1.WH_PID
        AND t2.ADM_ID <> t1.ADM_ID
        AND t2.WH_OUT_DATETIME >= (t1.WH_IN_DATETIME - 1)
        AND t2.WH_OUT_DATETIME < t1.WH_IN_DATETIME
  ORDER BY t1.WH_PID, t1.WH_IN_DATETIME
)
UPDATE sc
SET    @EP_ID = sc.EP_ID = CASE
                               WHEN cte.RowNum = 1 THEN 1
                               WHEN cte.[PriorOut] IS NULL THEN (@EP_ID + 1)
                               ELSE @EP_ID
                        END
FROM   #Scratch sc
INNER JOIN cte
        ON cte.ADM_ID = sc.ADM_ID
Step 3: Select Joining the Scratch Table
SELECT tab.ADM_ID,
       tab.WH_PID,
       sc.EP_ID,
       MIN(tab.WH_IN_DATETIME) OVER (PARTITION BY tab.WH_PID, sc.EP_ID)
           AS [EP_IN_DATETIME],
       MAX(tab.WH_OUT_DATETIME) OVER (PARTITION BY tab.WH_PID, sc.EP_ID)
           AS [EP_OUT_DATETIME],
       tab.WH_IN_DATETIME,
       tab.WH_OUT_DATETIME
FROM   #Table tab
INNER JOIN #Scratch sc
    ON sc.ADM_ID = tab.ADM_ID
ORDER BY tab.ADM_ID;
Resources
MSDN page for UPDATE
look for "@variable = column = expression"
Performance Analysis of doing Running Totals (not exactly the same thing as here, but not too far off)
This blog post does mention:
WH_PID values are not at least grouped together naturally via the ordering of the clustered index and ordered by WH_IN_DATETIME, then those two fields just get added to the scratch table and the PK (with implied clustered index) on the scratch table becomes (WH_PID, WH_IN_DATETIME, ADM_ID).If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With