At some point soon I'll have to prepare a list of prices of items on days. The granularity is 1 day and on days when there are sales of an item, I'll average the prices to get that day's average. There will be days where no sales are made, and I'm suited that an adequate approximation can be used by pulling the previous and next occurrences of sales, and for each day between them having a price that transitions linearly from one to the other.
Imagine the raw data is:
Item Date Price
Bread 2000-01-01 10
Bread 2000-01-02 9.5
Bread 2000-01-04 9.1
Sugar 2000-01-01 100
Sugar 2000-01-11 150
I can get to here:
Item Date Price
Bread 2000-01-01 10
Bread 2000-01-02 9.5
Bread 2000-01-03 NULL
Bread 2000-01-04 9.1
Sugar 2000-01-01 100
Sugar 2000-01-02 NULL
Sugar 2000-01-03 NULL
Sugar 2000-01-04 NULL
Sugar 2000-01-05 NULL
Sugar 2000-01-06 NULL
Sugar 2000-01-07 NULL
Sugar 2000-01-08 NULL
Sugar 2000-01-09 NULL
Sugar 2000-01-10 NULL
Sugar 2000-01-11 150
Where I want to get to is:
Item Date Price
Bread 2000-01-01 10
Bread 2000-01-02 9.5
Bread 2000-01-03 9.3 --being 9.5 + ((9.1 - 9.5 / 2) * 1)
Bread 2000-01-04 9.1
Sugar 2000-01-01 100
Sugar 2000-01-02 105 --being 100 + (150 - 100 / 10) * 1)
Sugar 2000-01-03 110 --being 100 + (150 - 100 / 10) * 2)
Sugar 2000-01-04 115
Sugar 2000-01-05 120
Sugar 2000-01-06 125
Sugar 2000-01-07 130
Sugar 2000-01-08 135
Sugar 2000-01-09 140
Sugar 2000-01-10 145 --being 100 + (150 - 100 / 10) * 9)
Sugar 2000-01-11 150
What have I tried so far? Thinking only; I'm planning on doing something like:
I am, however, wondering if there's a simpler way, because I've got millions of item-days and this doesn't feel like it'll be that efficient..
I find lots of examples of questions where the data from the last or next row is smeared verbatim to fill in the gaps, but I don't recall seeing this situation where some kind of transition is attempted. Perhaps this technique can be doubly applied, by having a smear that runs forwards, replicating the most recent value and alongside it a smear that runs backwards:
Item Date DateFwd DateBak PriceF PriceB
Bread 2000-01-01 2000-01-01 2000-01-01 10 10
Bread 2000-01-02 2000-01-02 2000-01-02 9.5 9.5
Bread 2000-01-03 2000-01-02 2000-01-04 9.5 9.1
Bread 2000-01-04 2000-01-04 2000-01-04 9.1 9.1
Sugar 2000-01-01 2000-01-01 2000-01-01 100 100
Sugar 2000-01-02 2000-01-01 2000-01-11 100 150
Sugar 2000-01-03 2000-01-01 2000-01-11 100 150
Sugar 2000-01-04 2000-01-01 2000-01-11 100 150
Sugar 2000-01-05 2000-01-01 2000-01-11 100 150
Sugar 2000-01-06 2000-01-01 2000-01-11 100 150
Sugar 2000-01-07 2000-01-01 2000-01-11 100 150
Sugar 2000-01-08 2000-01-01 2000-01-11 100 150
Sugar 2000-01-09 2000-01-01 2000-01-11 100 150
Sugar 2000-01-10 2000-01-01 2000-01-11 100 150
Sugar 2000-01-11 2000-01-11 2000-01-11 150 150
These might provide the necessary data for the formula
(preceding_price + ((next_price - preceding_price)/gap_distance) * gap_progress)
:
?
Here's a DDL of the data that I know I can get to (raw data joined with calendar table)
CREATE TABLE Data
([I] varchar(5), [D] date, [P] DECIMAL(10,5))
;
INSERT Data
([I], [D], [P])
VALUES
('Bread', '2000-01-01', 10),
('Bread', '2000-01-02', 9.5),
('Bread', '2000-01-04', 9.1),
('Sugar', '2000-01-01', 100),
('Sugar', '2000-01-11', 150);
CREATE TABLE Cal([D] DATE);
INSERT Cal VALUES
('2000-01-01'),
('2000-01-02'),
('2000-01-03'),
('2000-01-04'),
('2000-01-05'),
('2000-01-06'),
('2000-01-07'),
('2000-01-08'),
('2000-01-09'),
('2000-01-10'),
('2000-01-11');
SELECT d.i as [item], c.d as [date], d.p as [price] FROM
cal c LEFT JOIN data d ON c.d = d.d
You can use OUTER APPLY
to get the previous and next row with a price that is not null:
select
d.item,
d.date,
case when d.price is null then
prev.price + ( (next.price - prev.price) /
datediff(day, prev.date, next.date) *
datediff(day, prev.date, d.date)
)
else
d.price
end as price
from data d
outer apply
(
select top(1) *
from data d2
where d2.item = d.item and d2.date < d.date and d2.price is not null
order by d2.date desc
) prev
outer apply
(
select top(1) *
from data d2
where d2.item = d.item and d2.date > d.date and d2.price is not null
order by d2.date
) next;
Rextester demo: http://rextester.com/QBL7472
UPDATE: This is probably slow. Perhaps it helps to add and d.price is null
to the where clauses in the subqueries to show the DBMS that it doesn't have to actually look for other records when the price is not null. Just check the explain plans to see if that helps.
it is easier to generate those missing gap together with the Price in one go
So i start off with your original raw data
CREATE TABLE t
([I] varchar(5), [D] date, [P] DECIMAL(10,2))
;
INSERT INTO t
([I], [D], [P])
VALUES
('Bread', '2000-01-01 00:00:00', '10'),
('Bread', '2000-01-02 00:00:00', '9.5'),
('Bread', '2000-01-04 00:00:00', '9.1'),
('Sugar', '2000-01-01 00:00:00', '100'),
('Sugar', '2000-01-11 00:00:00', '150');
; with
-- number is a tally table. here i use recursive cte to generate 100 numbers
number as
(
select n = 0
union all
select n = n + 1
from number
where n < 99
),
-- a cte to get the Price of next date and also day diff
cte as
(
select *,
nextP = lead(P) over(partition by I order by D),
cnt = datediff(day, D, lead(D) over(partition by I order by D)) - 1
from t
)
select I,
D = dateadd(day, n, D),
P = coalesce(c.P + (c.nextP - c.P) / ( cnt + 1) * n, c.P)
from cte c
cross join number n
where n.n <= isnull(c.cnt, 0)
drop table t
I would put your formula 100 + (150 - 100 / 10) * 9)
etc into a scalar UDF and use it in a persisted computed column.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With