I have a table that records a row for each time a score for a location has changed.
score_history:
This was done with an eye on efficiency and being able to simply retrieve a list of changes for a given location and serves that purpose nicely.
I'm trying to output the data in a very redundant format to help load it into a rigid external system. The external system expects a row for each location * every date. The goal is to represent the last score value for each location for each date. So if the score changed 3 times in a given date only the score closest to midnight would be considered that locations closing score for the day. I imagine this is similar to the challenge of creating a close of business inventory level fact table.
I have a handy star schema style date dimension table which has a row for every date fully covering this sample period and well into the future.
That table looks like
dw_dim_date:
So, if I had only 3 records in the score_history table...
1, 2019-01-01:10:13:01, 100, 5.0
2, 2019-01-05:20:00:01, 100, 5.8
3, 2019-01-05:23:01:22, 100, 6.2
The desired output would be:
2019-01-01, 100, 5.0
2019-01-02, 100, 5.0
2019-01-03, 100, 5.0
2019-01-04, 100, 5.0
2019-01-05, 100, 6.2
3 Requirements:
I've been chasing my tail through subqueries and window functions.
Because I'm hesitant to post something without something I tried I'll share this trainwreck which produces output but of no meaning...
SELECT dw_dim_date.date,
(SELECT score
FROM score_history
WHERE score_history.happened_at::DATE < dw_dim_date.date
OR score_history.happened_at::DATE = dw_dim_date.date
ORDER BY score_history.id desc limit 1) as last_score
FROM dw_dim_date
WHERE dw_dim_date.date > '2019-06-01'
Grateful for guidance or pointers to other questions to read.
You could achieve it with usage of correlated subqueries and LATERAL
:
SELECT sub.date, sub.location_id, score
FROM (SELECT * FROM dw_dim_date
CROSS JOIN (SELECT DISTINCT location_id FROM score_history) s
WHERE date >= '2019-01-01'::date) sub
,LATERAL(SELECT score FROM score_history sc
WHERE sc.happened_at::date <= sub.date
AND sc.location_id = sub.location_id
ORDER BY happened_at DESC LIMIT 1) l
,LATERAL(SELECT MIN(happened_at::date) m1, MAX(happened_at::date) m2
FROM score_history sc
WHERE sc.location_id = sub.location_id) lm
WHERE sub.date BETWEEN lm.m1 AND lm.m2
ORDER BY location_id, date;
db<>fiddle demo
How it works:
1) s
(it is cross join of all dates per location_id)
2) l
(selecting score per location)
3) lm
(selecting min/max date per location for filtering)
4) WHERE
filter dates on range that is available, it could be relaxed if needed
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With