I'm querying a snapshot of customer data that contains the snapshot date, the customer ID and the 'value' of that customer on that day. I use the LAG function to return the previous days value to know if there is a drop/rise/complete loss/complete new value (from £0 to > £0).
The end game is to identify the min and max dates where the customer was at £0 value.
Originally I tried MIN(Date) and Max(Date) grouping by the Customer and Value. However if a customer dropped to £0 over different date ranges, it would bring back the max of the latest date range and the min of the earliest, instead of the ideal - bring back both ranges where it was £0.
I've tried using DENSE_RANK() to split each the values of the customer, but doing so just ranks all £0 values in the same rank.
Here is some sample code to show you the data I'm working with and how i've tried to split it:
DROP TABLE IF EXISTS #SnapshotTable
CREATE TABLE #SnapshotTable
(
Row_ID INT IDENTITY(1,1)
,SnapshotDate DATE
,SnapshotDateKey INT
,CustomerId INT
,Value DECIMAL(18,2)
)
INSERT INTO #SnapshotTable (SnapshotDate, SnapshotDateKey, CustomerId, Value)
SELECT '2019-01-01', 20190101, 1, 0.00
UNION SELECT '2019-01-02', 20190102, 1, 0.00
UNION SELECT '2019-01-03', 20190103, 1, 5.00
UNION SELECT '2019-01-04', 20190104, 1, 5.00
UNION SELECT '2019-01-05', 20190105, 1, 3.00
UNION SELECT '2019-01-06', 20190106, 1, 3.00
UNION SELECT '2019-01-07', 20190107, 1, 0.00
UNION SELECT '2019-01-08', 20190108, 1, 0.00
UNION SELECT '2019-01-09', 20190109, 1, 10.00
UNION SELECT '2019-01-10', 20190110, 1, 0.00
SELECT * FROM #SnapshotTable
-- Code that doesn't work correctly
SELECT
CustomerId
,Value
,MinDate = MIN(SnapshotDateKey)
,MaxDate = MAX(SnapshotDateKey)
FROM #SnapshotTable
GROUP BY
CustomerId
,Value
-- Attempted with dense rank
ALTER TABLE #SnapshotTable
ADD DenseRankTest INT NULL
GO
-- Update with Dense Rank
UPDATE TGT
SET
TGT.DenseRankTest = SRC.NewRank
FROM #SnapshotTable TGT
INNER JOIN (SELECT
Row_ID
,NewRank = DENSE_RANK() OVER (PARTITION BY CustomerId ORDER BY Value ASC)
FROM #SnapshotTable
) AS SRC
ON SRC.Row_ID = TGT.Row_ID
SELECT * FROM #SnapshotTable
Now I can see that the dense_rank() function is kind of functioning how I want it to but honestly I've been looking at this for a while now and I cannot get my head around how to do it correctly.
Can somebody please advise on what I need to do?
I'm expecting to see:
SELECT [StartDateKey] = 20190101, [EndDateKey] = 20190102, [CustomerId] = 1, [Value] = 0
UNION SELECT [StartDateKey] = 20190103, [EndDateKey] = 20190104, [CustomerId] = 1, [Value] = 5
UNION SELECT [StartDateKey] = 20190105, [EndDateKey] = 20190106, [CustomerId] = 1, [Value] = 3
UNION SELECT [StartDateKey] = 20190107, [EndDateKey] = 20190108, [CustomerId] = 1, [Value] = 0
UNION SELECT [StartDateKey] = 20190109, [EndDateKey] = 20190109, [CustomerId] = 1, [Value] = 10
UNION SELECT [StartDateKey] = 20190120, [EndDateKey] = 20190110, [CustomerId] = 1, [Value] = 0
Edit: For those who stumble across this, with the help of the people here I've found this as a good read for understanding the issue/solving the issue.
Returns the earliest date found in a field or group. You can apply a filter to limit the rows in the search. You can also use MIN to return the smallest numeric value or the lowest string value in a field or group.
MAX and MIN operate on columns that contain character, graphic, numeric, date/time, and binary data (except for binary large object, or BLOB, data). The parentheses are required.
Note: MIN and MAX can be used with numeric, string, and date values.
If you're working with MySQL, you can combine MAX() with the GREATEST() function to get the biggest value from two or more fields. Here's the syntax for GREATEST: GREATEST(value1,value2,...)
This is a gaps-and-islands problem. But the accepted answer on the purported duplicate is simply not the best way to approach this. And the more highly voted answer is still overly complicated.
A much simpler method is:
select customerid, value, min(SnapshotDateKey), max(SnapshotDateKey)
from (select st.*,
row_number() over (partition by customerid, value order by snapshotdate) as seqnum
from snapshottable st
) st
group by dateadd(day, -seqnum, snapshotdate), customerid, value
order by min(SnapshotDateKey);
Here is a db<>fiddle.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With