I have a table in my PG db that looks somewhat like this:
id | widget_id | for_date | score |
Each referenced widget has a lot of these items. It's always 1 per day per widget, but there are gaps.
What I want to get is a result that contains all the widgets for each date since X. The dates are brought in via generate series:
SELECT date.date::date
FROM generate_series('2012-01-01'::timestamp with time zone,'now'::text::date::timestamp with time zone, '1 day') date(date)
ORDER BY date.date DESC;
If there is no entry for a date for a given widget_id, I want to use the previous one. So say widget 1337 doesn't have an entry on 2012-05-10, but on 2012-05-08, then I want the resultset to show the 2012-05-08 entry on 2012-05-10 as well:
Actual data:
widget_id | for_date | score
1312 | 2012-05-07 | 20
1337 | 2012-05-07 | 12
1337 | 2012-05-08 | 41
1337 | 2012-05-11 | 500
Desired output based on generate series:
widget_id | for_date | score
1336 | 2012-05-07 | 20
1337 | 2012-05-07 | 12
1336 | 2012-05-08 | 20
1337 | 2012-05-08 | 41
1336 | 2012-05-09 | 20
1337 | 2012-05-09 | 41
1336 | 2012-05-10 | 20
1337 | 2012-05-10 | 41
1336 | 2012-05-11 | 20
1337 | 2012-05-11 | 500
Eventually I want to boil this down into a view so I have consistent data sets per day that I can query easily.
Edit: Made the sample data and expected resultset clearer
SQL Fiddle
select
widget_id,
for_date,
case
when score is not null then score
else first_value(score) over (partition by widget_id, c order by for_date)
end score
from (
select
a.widget_id,
a.for_date,
s.score,
count(score) over(partition by a.widget_id order by a.for_date) c
from (
select widget_id, g.d::date for_date
from (
select distinct widget_id
from score
) s
cross join
generate_series(
(select min(for_date) from score),
(select max(for_date) from score),
'1 day'
) g(d)
) a
left join
score s on a.widget_id = s.widget_id and a.for_date = s.for_date
) s
order by widget_id, for_date
First of all, you can have a much simpler generate_series()
table expression. Equivalent to yours (except for descending order, that contradicts the rest of your question anyways):
SELECT generate_series('2012-01-01'::date, now()::date, '1d')::date
The type date
is coerced to timestamptz
automatically on input. The return type is timestamptz
either way. I use a subquery below, so I can cast to the output to date
right away.
Next, max()
as window function returns exactly what you need: the highest value since frame start ignoring NULL
values. Building on that, you get a radically simple query.
Most likely faster than involving CROSS JOIN
or WITH RECURSIVE
:
SELECT a.day, s.*
FROM (
SELECT d.day
,max(s.for_date) OVER (ORDER BY d.day) AS effective_date
FROM (
SELECT generate_series('2012-01-01'::date, now()::date, '1d')::date
) d(day)
LEFT JOIN score s ON s.for_date = d.day
AND s.widget_id = 1337 -- "for a given widget_id"
) a
LEFT JOIN score s ON s.for_date = a.effective_date
AND s.widget_id = 1337
ORDER BY a.day;
->sqlfiddle
With this query you can put any column from score
you like into the final SELECT
list. I put s.* for simplicity. Pick your columns.
If you want to start your output with the first day that actually has a score, simply replace the last LEFT JOIN
with JOIN
.
Here I use a CROSS JOIN
to produce a row for every widget on every date ..
SELECT a.day, a.widget_id, s.score
FROM (
SELECT d.day, w.widget_id
,max(s.for_date) OVER (PARTITION BY w.widget_id
ORDER BY d.day) AS effective_date
FROM (SELECT generate_series('2012-05-05'::date
,'2012-05-15'::date, '1d')::date AS day) d
CROSS JOIN (SELECT DISTINCT widget_id FROM score) AS w
LEFT JOIN score s ON s.for_date = d.day AND s.widget_id = w.widget_id
) a
JOIN score s ON s.for_date = a.effective_date
AND s.widget_id = a.widget_id -- instead of LEFT JOIN
ORDER BY a.day, a.widget_id;
->sqlfiddle
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With