I am using SQL Server 2008 and need to create a query that shows rows that fall within a date range.
My table is as follows:
ADM_ID WH_PID WH_IN_DATETIME WH_OUT_DATETIME
My rules are:
I would like another column added to the results which identify the grouped value if possible as EP_ID
.
e.g.
ADM_ID WH_PID WH_IN_DATETIME WH_OUT_DATETIME
------ ------ -------------- ---------------
1 9 2014-10-12 00:00:00 2014-10-13 15:00:00
2 9 2014-10-14 14:00:00 2014-10-15 15:00:00
3 9 2014-10-16 14:00:00 2014-10-17 15:00:00
4 9 2014-11-20 00:00:00 2014-11-21 00:00:00
5 5 2014-10-17 00:00:00 2014-10-18 00:00:00
Would return rows with:
ADM_ID WH_PID EP_ID EP_IN_DATETIME EP_OUT_DATETIME WH_IN_DATETIME WH_OUT_DATETIME
------ ------ ----- ------------------- ------------------- ------------------- -------------------
1 9 1 2014-10-12 00:00:00 2014-10-17 15:00:00 2014-10-12 00:00:00 2014-10-13 15:00:00
2 9 1 2014-10-12 00:00:00 2014-10-17 15:00:00 2014-10-14 14:00:00 2014-10-15 15:00:00
3 9 1 2014-10-12 00:00:00 2014-10-17 15:00:00 2014-10-16 14:00:00 2014-10-17 15:00:00
4 9 2 2014-11-20 00:00:00 2014-11-20 00:00:00 2014-10-16 14:00:00 2014-11-21 00:00:00
5 5 1 2014-10-17 00:00:00 2014-10-18 00:00:00 2014-10-17 00:00:00 2014-10-18 00:00:00
The EP_OUT_DATETIME will always be the latest date in the group. Hope this clarifies a bit. This way, I can group by the EP_ID and find the EP_OUT_DATETIME and start time for any ADM_ID/PID that fall within.
Each should roll into the next, meaning that if another row has an WH_IN_DATETIME which follows on the WH_OUT_DATETIME of another for the same WH_PID, than that row's WH_OUT_DATETIME becomes the EP_OUT_DATETIME for all of the WH_PID's within that EP_ID.
I hope this makes some sense.
Thanks, MR
Since the question does not specify that the solution be a "single" query ;-), here is another approach: using the "quirky update" feature dealy, which is updating a variable at the same time you update a column. Breaking down the complexity of this operation, I create a scratch table to hold the piece that is the hardest to calculate: the EP_ID
. Once that is done, it gets joined into a simple query and provides the window with which to calculate the EP_IN_DATETIME
and EP_OUT_DATETIME
fields.
The steps are:
ADM_ID
values -- this lets us do an UPDATE as all of the rows already exist.The Test Setup
SET ANSI_NULLS ON;
SET NOCOUNT ON;
CREATE TABLE #Table
(
ADM_ID INT NOT NULL PRIMARY KEY,
WH_PID INT NOT NULL,
WH_IN_DATETIME DATETIME NOT NULL,
WH_OUT_DATETIME DATETIME NOT NULL
);
INSERT INTO #Table VALUES (1, 9, '2014-10-12 00:00:00', '2014-10-13 15:00:00');
INSERT INTO #Table VALUES (2, 9, '2014-10-14 14:00:00', '2014-10-15 15:00:00');
INSERT INTO #Table VALUES (3, 9, '2014-10-16 14:00:00', '2014-10-17 15:00:00');
INSERT INTO #Table VALUES (4, 9, '2014-11-20 00:00:00', '2014-11-21 00:00:00');
INSERT INTO #Table VALUES (5, 5, '2014-10-17 00:00:00', '2014-10-18 00:00:00');
Step 1: Create and Populate the Scratch Table
CREATE TABLE #Scratch
(
ADM_ID INT NOT NULL PRIMARY KEY,
EP_ID INT NOT NULL
-- Might need WH_PID and WH_IN_DATETIME fields to guarantee proper UPDATE ordering
);
INSERT INTO #Scratch (ADM_ID, EP_ID)
SELECT ADM_ID, 0
FROM #Table;
Alternate scratch table structure to ensure proper update order (since "quirky update" uses the order of the Clustered Index, as noted at the bottom of this answer):
CREATE TABLE #Scratch
(
WH_PID INT NOT NULL,
WH_IN_DATETIME DATETIME NOT NULL,
ADM_ID INT NOT NULL,
EP_ID INT NOT NULL
);
INSERT INTO #Scratch (WH_PID, WH_IN_DATETIME, ADM_ID, EP_ID)
SELECT WH_PID, WH_IN_DATETIME, ADM_ID, 0
FROM #Table;
CREATE UNIQUE CLUSTERED INDEX [CIX_Scratch]
ON #Scratch (WH_PID, WH_IN_DATETIME, ADM_ID);
Step 2: Update the Scratch Table using a local variable to keep track of the prior value
DECLARE @EP_ID INT; -- this is used in the UPDATE
;WITH cte AS
(
SELECT TOP (100) PERCENT
t1.*,
t2.WH_OUT_DATETIME AS [PriorOut],
t2.ADM_ID AS [PriorID],
ROW_NUMBER() OVER (PARTITION BY t1.WH_PID ORDER BY t1.WH_IN_DATETIME)
AS [RowNum]
FROM #Table t1
LEFT JOIN #Table t2
ON t2.WH_PID = t1.WH_PID
AND t2.ADM_ID <> t1.ADM_ID
AND t2.WH_OUT_DATETIME >= (t1.WH_IN_DATETIME - 1)
AND t2.WH_OUT_DATETIME < t1.WH_IN_DATETIME
ORDER BY t1.WH_PID, t1.WH_IN_DATETIME
)
UPDATE sc
SET @EP_ID = sc.EP_ID = CASE
WHEN cte.RowNum = 1 THEN 1
WHEN cte.[PriorOut] IS NULL THEN (@EP_ID + 1)
ELSE @EP_ID
END
FROM #Scratch sc
INNER JOIN cte
ON cte.ADM_ID = sc.ADM_ID
Step 3: Select Joining the Scratch Table
SELECT tab.ADM_ID,
tab.WH_PID,
sc.EP_ID,
MIN(tab.WH_IN_DATETIME) OVER (PARTITION BY tab.WH_PID, sc.EP_ID)
AS [EP_IN_DATETIME],
MAX(tab.WH_OUT_DATETIME) OVER (PARTITION BY tab.WH_PID, sc.EP_ID)
AS [EP_OUT_DATETIME],
tab.WH_IN_DATETIME,
tab.WH_OUT_DATETIME
FROM #Table tab
INNER JOIN #Scratch sc
ON sc.ADM_ID = tab.ADM_ID
ORDER BY tab.ADM_ID;
Resources
MSDN page for UPDATE
look for "@variable = column = expression"
Performance Analysis of doing Running Totals (not exactly the same thing as here, but not too far off)
This blog post does mention:
WH_PID
values are not at least grouped together naturally via the ordering of the clustered index and ordered by WH_IN_DATETIME
, then those two fields just get added to the scratch table and the PK (with implied clustered index) on the scratch table becomes (WH_PID, WH_IN_DATETIME, ADM_ID)
.If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With