Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

MIN or MAX Value for Date Ranges - Determining lowest price for a given date range based on Product ID, Price, and Date Ranges

I really hope some of you like challenges. I have a table of Product Ids, Prices and the Date Ranges for when those prices are active.

+----+-------+---------------------+---------------------+
| Id | Price |      StartDate      |       EndDate       |
+----+-------+---------------------+---------------------+
|  1 |    19 | 2016-12-01 00:00:00 | 2017-12-01 23:59:59 |
|  1 |    18 | 2017-01-01 00:00:00 | 2018-01-12 23:59:59 |
|  1 |    17 | 2017-02-03 00:00:00 | 2017-03-03 23:59:59 |
|  1 |    16 | 2018-01-01 00:00:00 | 2018-03-02 23:59:59 |
|  2 |    15 | 2017-01-01 00:00:00 | 2017-03-05 23:59:59 |
|  2 |    15 | 2017-03-06 00:00:00 | 2017-03-31 23:59:59 |
|  2 |    30 | 2017-04-01 00:00:00 | 2017-05-03 23:59:59 |
|  3 |    12 | 2017-01-01 00:00:00 | 2017-01-31 23:59:59 |
|  3 |    12 | 2017-02-01 00:00:00 | 2017-02-28 23:59:59 |
|  4 |    14 | 2017-01-01 00:00:00 | 2017-04-05 23:59:59 |
|  4 |    14 | 2017-04-01 00:00:00 | 2017-04-30 23:59:59 |
|  4 |    12 | 2017-04-15 00:00:00 | 2017-05-30 23:59:59 |
|  5 |    20 | 2017-01-01 00:00:00 | 2017-01-31 23:59:59 |
|  5 |    20 | 2017-03-01 00:00:00 | 2017-03-31 23:59:59 |
|  6 |    15 | 2017-01-01 00:00:00 | 2017-01-31 23:59:59 |
|  6 |    15 | 2017-02-01 00:00:00 | 2017-02-28 23:59:59 |
|  6 |    15 | 2017-04-01 00:00:00 | 2017-04-30 23:59:59 |
+----+-------+---------------------+---------------------+

SQLFiddle: http://sqlfiddle.com/#!6/39288/1

I need to get it in a format where:

  1. Date periods have the same Id and price that "touch" (i.e. Id #3) are merged into one period.

  2. Date periods that overlap (i.e. Id #4) are merged into one period.

  3. The lowest price is shown for each product and during what range.

  4. Date ranges that have gaps and the same price don't merge and are separate rows (i.e. Id #5).

The result should be:

+----+-------+---------------------+---------------------+
| Id | Price |      StartDate      |       EndDate       |
+----+-------+---------------------+---------------------+
|  1 |    19 | 2016-12-01 00:00:00 | 2016-12-31 23:59:59 |
|  1 |    18 | 2017-01-01 00:00:00 | 2017-02-02 23:59:59 |
|  1 |    17 | 2017-02-03 00:00:00 | 2017-03-03 23:59:59 |
|  1 |    19 | 2017-03-04 00:00:00 | 2017-12-01 23:59:59 |
|  1 |    18 | 2017-12-02 00:00:00 | 2017-12-31 23:59:59 |
|  1 |    16 | 2018-01-01 00:00:00 | 2018-03-02 23:59:59 |
|  2 |    15 | 2017-01-01 00:00:00 | 2017-03-31 23:59:59 |
|  2 |    30 | 2017-04-01 00:00:00 | 2017-05-03 23:59:59 |
|  3 |    12 | 2017-01-01 00:00:00 | 2017-02-28 23:59:59 |
|  4 |    14 | 2017-01-01 00:00:00 | 2017-04-14 23:59:59 |
|  4 |    12 | 2017-04-15 00:00:00 | 2017-05-30 23:59:59 |
|  5 |    20 | 2017-01-01 00:00:00 | 2017-01-31 23:59:59 |
|  5 |    20 | 2017-03-01 00:00:00 | 2017-03-31 23:59:59 |
|  6 |    15 | 2017-01-01 00:00:00 | 2017-02-28 23:59:59 |
|  6 |    15 | 2017-04-01 00:00:00 | 2017-04-30 23:59:59 |
+----+-------+---------------------+---------------------+

Overall, it's essentially determining the best price between two dates.

I've worked with this table in the past and was able to solve it in C#, but this time I need a pure TSQL approach.

I've already gone down some deep nested CTE's and lost my mind with getting results nowhere near what they should be. Thanks in advance for anyone who can assist.

Edit: I even messed up the desired results because this is so confusing. Fixed (I think).

Edit 2: Example:

+------+-------+-------------------------+-------------------------+
|  Id  | Price |        StartDate        |         EndDate         |
+------+-------+-------------------------+-------------------------+
| 8611 | 31.98 | 2017-06-06 00:00:00.000 | 2017-09-24 23:59:59.000 |
| 8611 | 31.98 | 2017-09-25 00:00:00.000 | 2017-12-31 23:59:59.000 |
| 8611 | 28.78 | 2017-07-31 00:00:00.000 | 2017-09-30 23:59:59.000 |
| 8611 | 28.78 | 2017-10-30 00:00:00.000 | 2017-12-31 23:59:59.000 |
+------+-------+-------------------------+-------------------------+

@GordonLinoff's results to:

+------+-------+-------------------------+-------------------------+
|  Id  | Price |        StartDate        |         EndDate         |
+------+-------+-------------------------+-------------------------+
| 8611 | 28.78 | 2017-06-06 00:00:00.000 | 2017-12-31 23:59:59.000 |
+------+-------+-------------------------+-------------------------+

Result should be:

+------+-------+-------------------------+-------------------------+
|  Id  | Price |        StartDate        |         EndDate         |
+------+-------+-------------------------+-------------------------+
| 8611 | 31.98 | 2017-06-06 00:00:00.000 | 2017-07-30 23:59:59.000 |
| 8611 | 28.78 | 2017-07-31 00:00:00.000 | 2017-09-30 23:59:59.000 |
| 8611 | 31.98 | 2017-10-01 00:00:00.000 | 2017-10-29 23:59:59.000 |
| 8611 | 28.78 | 2017-10-30 00:00:00.000 | 2017-12-31 23:59:59.000 |
+------+-------+-------------------------+-------------------------+
like image 701
justiceorjustus Avatar asked Oct 18 '22 08:10

justiceorjustus


2 Answers

Do you have a Calendar/Date table available? If so, then you can use the date table to help you get the minimum price per product for each date within the periods in your table.

Afterwards you can get the start and end dates of each of your periods by looking at the next and previous records with the same product Id. You can use the LAG and LEAD functions to do this. This gives you the outer bounds of each of your desired groups.

From there it's just a bit of fiddling around to get your final result. I've provided an example below, which should give you the results you want.

--Get the best price per date for each product
WITH BestPricePerDate AS (
    SELECT 
        Id,
        MIN(Price) Price,
        c.[Date]
    FROM [YourTable] yt
        INNER JOIN dbo.Calendar c
            ON c.[Date] BETWEEN yt.StartDate AND yt.EndDate
    GROUP BY Id, [Date]
),
--Check whether the date is the start or the end of a period
PeriodsMarkedPerId AS(
    SELECT 
        Id,
        Price,
        [Date],
        CASE WHEN 
            ISNULL(LAG(Price,1) OVER (PARTITION BY Id ORDER BY [Date]),-1) <> Price 
            OR ISNULL(LAG([Date],1) OVER (PARTITION BY Id ORDER BY [Date]),'1999-01-01') <> DATEADD(DAY,-1,[Date]) THEN 1 ELSE 0 END IsStartDate,
        CASE WHEN 
            ISNULL(LEAD(Price,1) OVER (PARTITION BY Id ORDER BY [Date]),-1) <> Price 
            OR ISNULL(LEAD([Date],1) OVER (PARTITION BY Id ORDER BY [Date]),'1999-01-01') <> DATEADD(DAY,1,[Date]) THEN 1 ELSE 0 END IsEndDate
    FROM BestPricePerDate
),
--Keep only the start and end date records
PeriodStartAndEndDates AS(
    SELECT 
        Id, 
        Price,
        [Date],
        IsStartDate,
        IsEndDate
    FROM PeriodsMarkedPerId
    WHERE IsStartDate = 1 OR IsEndDate = 1
),
--Move StartDate and EndDate to one record
StartAndEndDatesOnSameRow AS(
    SELECT 
        Id, 
        Price, 
        [Date] AS StartDate,
        LEAD([Date],1) OVER (ORDER BY Id, [Date]) AS EndDate,
        IsStartDate
    FROM PeriodStartAndEndDates
)
--Get the resulting periods
SELECT Id, Price, StartDate, EndDate 
FROM StartAndEndDatesOnSameRow
WHERE IsStartDate = 1
ORDER BY Id, StartDate

If you do not have a date table, then you can easily create one. There are tons of examples of this around the web.

I hope this helps!

like image 108
dybzon Avatar answered Oct 20 '22 22:10

dybzon


You can define the beginning of a period as one that does not overlap. That is tricky, but can be done using exists or a cumulative maximum of the end date excluding the current row.

Then, each non-overlap is the beginning of a group. That group can be used for aggregation:

select id, min(startDate) as startDate, max(endDate) as endDate, min(price) as price
from (select t.*,
             sum(case when prev_endDate < dateadd(second, -1, startDate)
                      then 1 else 0
                 end) over (partition by id order by startdate) as grp
      from (select t.*,
                   max(endDate) over (partition by id
                                      order by startdate
                                      rows between unbounded preceding and 1 preceding
                                     ) as prev_endDate
            from t
           ) t
     ) t
group by id, grp;

I'm not 100% sure that this works. I just thought of using the cumulative maximum end date for this. I'm pretty sure it covers all overlapping cases, but I might have missed something.

like image 38
Gordon Linoff Avatar answered Oct 21 '22 00:10

Gordon Linoff