I really hope some of you like challenges. I have a table of Product Ids, Prices and the Date Ranges for when those prices are active.
+----+-------+---------------------+---------------------+
| Id | Price | StartDate | EndDate |
+----+-------+---------------------+---------------------+
| 1 | 19 | 2016-12-01 00:00:00 | 2017-12-01 23:59:59 |
| 1 | 18 | 2017-01-01 00:00:00 | 2018-01-12 23:59:59 |
| 1 | 17 | 2017-02-03 00:00:00 | 2017-03-03 23:59:59 |
| 1 | 16 | 2018-01-01 00:00:00 | 2018-03-02 23:59:59 |
| 2 | 15 | 2017-01-01 00:00:00 | 2017-03-05 23:59:59 |
| 2 | 15 | 2017-03-06 00:00:00 | 2017-03-31 23:59:59 |
| 2 | 30 | 2017-04-01 00:00:00 | 2017-05-03 23:59:59 |
| 3 | 12 | 2017-01-01 00:00:00 | 2017-01-31 23:59:59 |
| 3 | 12 | 2017-02-01 00:00:00 | 2017-02-28 23:59:59 |
| 4 | 14 | 2017-01-01 00:00:00 | 2017-04-05 23:59:59 |
| 4 | 14 | 2017-04-01 00:00:00 | 2017-04-30 23:59:59 |
| 4 | 12 | 2017-04-15 00:00:00 | 2017-05-30 23:59:59 |
| 5 | 20 | 2017-01-01 00:00:00 | 2017-01-31 23:59:59 |
| 5 | 20 | 2017-03-01 00:00:00 | 2017-03-31 23:59:59 |
| 6 | 15 | 2017-01-01 00:00:00 | 2017-01-31 23:59:59 |
| 6 | 15 | 2017-02-01 00:00:00 | 2017-02-28 23:59:59 |
| 6 | 15 | 2017-04-01 00:00:00 | 2017-04-30 23:59:59 |
+----+-------+---------------------+---------------------+
SQLFiddle: http://sqlfiddle.com/#!6/39288/1
I need to get it in a format where:
Date periods have the same Id and price that "touch" (i.e. Id #3) are merged into one period.
Date periods that overlap (i.e. Id #4) are merged into one period.
The lowest price is shown for each product and during what range.
Date ranges that have gaps and the same price don't merge and are separate rows (i.e. Id #5).
The result should be:
+----+-------+---------------------+---------------------+
| Id | Price | StartDate | EndDate |
+----+-------+---------------------+---------------------+
| 1 | 19 | 2016-12-01 00:00:00 | 2016-12-31 23:59:59 |
| 1 | 18 | 2017-01-01 00:00:00 | 2017-02-02 23:59:59 |
| 1 | 17 | 2017-02-03 00:00:00 | 2017-03-03 23:59:59 |
| 1 | 19 | 2017-03-04 00:00:00 | 2017-12-01 23:59:59 |
| 1 | 18 | 2017-12-02 00:00:00 | 2017-12-31 23:59:59 |
| 1 | 16 | 2018-01-01 00:00:00 | 2018-03-02 23:59:59 |
| 2 | 15 | 2017-01-01 00:00:00 | 2017-03-31 23:59:59 |
| 2 | 30 | 2017-04-01 00:00:00 | 2017-05-03 23:59:59 |
| 3 | 12 | 2017-01-01 00:00:00 | 2017-02-28 23:59:59 |
| 4 | 14 | 2017-01-01 00:00:00 | 2017-04-14 23:59:59 |
| 4 | 12 | 2017-04-15 00:00:00 | 2017-05-30 23:59:59 |
| 5 | 20 | 2017-01-01 00:00:00 | 2017-01-31 23:59:59 |
| 5 | 20 | 2017-03-01 00:00:00 | 2017-03-31 23:59:59 |
| 6 | 15 | 2017-01-01 00:00:00 | 2017-02-28 23:59:59 |
| 6 | 15 | 2017-04-01 00:00:00 | 2017-04-30 23:59:59 |
+----+-------+---------------------+---------------------+
Overall, it's essentially determining the best price between two dates.
I've worked with this table in the past and was able to solve it in C#, but this time I need a pure TSQL approach.
I've already gone down some deep nested CTE's and lost my mind with getting results nowhere near what they should be. Thanks in advance for anyone who can assist.
Edit: I even messed up the desired results because this is so confusing. Fixed (I think).
Edit 2: Example:
+------+-------+-------------------------+-------------------------+
| Id | Price | StartDate | EndDate |
+------+-------+-------------------------+-------------------------+
| 8611 | 31.98 | 2017-06-06 00:00:00.000 | 2017-09-24 23:59:59.000 |
| 8611 | 31.98 | 2017-09-25 00:00:00.000 | 2017-12-31 23:59:59.000 |
| 8611 | 28.78 | 2017-07-31 00:00:00.000 | 2017-09-30 23:59:59.000 |
| 8611 | 28.78 | 2017-10-30 00:00:00.000 | 2017-12-31 23:59:59.000 |
+------+-------+-------------------------+-------------------------+
@GordonLinoff's results to:
+------+-------+-------------------------+-------------------------+
| Id | Price | StartDate | EndDate |
+------+-------+-------------------------+-------------------------+
| 8611 | 28.78 | 2017-06-06 00:00:00.000 | 2017-12-31 23:59:59.000 |
+------+-------+-------------------------+-------------------------+
Result should be:
+------+-------+-------------------------+-------------------------+
| Id | Price | StartDate | EndDate |
+------+-------+-------------------------+-------------------------+
| 8611 | 31.98 | 2017-06-06 00:00:00.000 | 2017-07-30 23:59:59.000 |
| 8611 | 28.78 | 2017-07-31 00:00:00.000 | 2017-09-30 23:59:59.000 |
| 8611 | 31.98 | 2017-10-01 00:00:00.000 | 2017-10-29 23:59:59.000 |
| 8611 | 28.78 | 2017-10-30 00:00:00.000 | 2017-12-31 23:59:59.000 |
+------+-------+-------------------------+-------------------------+
Do you have a Calendar/Date table available? If so, then you can use the date table to help you get the minimum price per product for each date within the periods in your table.
Afterwards you can get the start and end dates of each of your periods by looking at the next and previous records with the same product Id. You can use the LAG and LEAD functions to do this. This gives you the outer bounds of each of your desired groups.
From there it's just a bit of fiddling around to get your final result. I've provided an example below, which should give you the results you want.
--Get the best price per date for each product
WITH BestPricePerDate AS (
SELECT
Id,
MIN(Price) Price,
c.[Date]
FROM [YourTable] yt
INNER JOIN dbo.Calendar c
ON c.[Date] BETWEEN yt.StartDate AND yt.EndDate
GROUP BY Id, [Date]
),
--Check whether the date is the start or the end of a period
PeriodsMarkedPerId AS(
SELECT
Id,
Price,
[Date],
CASE WHEN
ISNULL(LAG(Price,1) OVER (PARTITION BY Id ORDER BY [Date]),-1) <> Price
OR ISNULL(LAG([Date],1) OVER (PARTITION BY Id ORDER BY [Date]),'1999-01-01') <> DATEADD(DAY,-1,[Date]) THEN 1 ELSE 0 END IsStartDate,
CASE WHEN
ISNULL(LEAD(Price,1) OVER (PARTITION BY Id ORDER BY [Date]),-1) <> Price
OR ISNULL(LEAD([Date],1) OVER (PARTITION BY Id ORDER BY [Date]),'1999-01-01') <> DATEADD(DAY,1,[Date]) THEN 1 ELSE 0 END IsEndDate
FROM BestPricePerDate
),
--Keep only the start and end date records
PeriodStartAndEndDates AS(
SELECT
Id,
Price,
[Date],
IsStartDate,
IsEndDate
FROM PeriodsMarkedPerId
WHERE IsStartDate = 1 OR IsEndDate = 1
),
--Move StartDate and EndDate to one record
StartAndEndDatesOnSameRow AS(
SELECT
Id,
Price,
[Date] AS StartDate,
LEAD([Date],1) OVER (ORDER BY Id, [Date]) AS EndDate,
IsStartDate
FROM PeriodStartAndEndDates
)
--Get the resulting periods
SELECT Id, Price, StartDate, EndDate
FROM StartAndEndDatesOnSameRow
WHERE IsStartDate = 1
ORDER BY Id, StartDate
If you do not have a date table, then you can easily create one. There are tons of examples of this around the web.
I hope this helps!
You can define the beginning of a period as one that does not overlap. That is tricky, but can be done using exists
or a cumulative maximum of the end date excluding the current row.
Then, each non-overlap is the beginning of a group. That group can be used for aggregation:
select id, min(startDate) as startDate, max(endDate) as endDate, min(price) as price
from (select t.*,
sum(case when prev_endDate < dateadd(second, -1, startDate)
then 1 else 0
end) over (partition by id order by startdate) as grp
from (select t.*,
max(endDate) over (partition by id
order by startdate
rows between unbounded preceding and 1 preceding
) as prev_endDate
from t
) t
) t
group by id, grp;
I'm not 100% sure that this works. I just thought of using the cumulative maximum end date for this. I'm pretty sure it covers all overlapping cases, but I might have missed something.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With