I have the following table:
links:
created_at           active 
2017-08-12 15:46:01  false
2017-08-13 15:46:01  true
2017-08-14 15:46:01  true
2017-08-15 15:46:01  false
When given a date range, I have to extract time series which tells me how many active links were created on a date equal or smaller than current (rolling) date.
Output (for date range 2017-08-12 - 2017-08-17):
day          count
2017-08-12   0 (there are 0 active links created on 2017-08-12 and earlier)
2017-08-13   1 (there is 1 active link created on 2017-08-13 and earlier)
2017-08-14   2 (there are 2 active links created on 2017-08-14 and earlier)
2017-08-15   2 ...
2017-08-16   2
2017-08-17   2
I came up with the following query for generating dates:
SELECT date_trunc('day', dd):: date
FROM generate_series
    ( '2017-08-12'::timestamp 
    , '2017-08-17'::timestamp
    , '1 day'::interval) dd
But the rolling counts confuse me and am unsure how to continue. Can this be solved with a window function?
This should be fastest:
SELECT day::date
     , sum(ct) OVER (ORDER BY day) AS count
FROM   generate_series (timestamp '2017-08-12'
                      , timestamp '2017-08-17'
                      , interval  '1 day') day
LEFT   JOIN  (
   SELECT date_trunc('day', created_at) AS day, count(*) AS ct
   FROM   tbl
   WHERE  active -- fastest
   GROUP  BY 1
   ) t USING (day)
ORDER  BY 1;
dbfiddle here
count() only counts non-null rows, so you could use count(active OR NULL). But the fastest option for counting is to exclude irrelevant rows with a WHERE clause to begin with. Since we are adding all days with generate_series() anyway, this is the best option.
Compare:
Since generate_series() returns timestamp (not date) I use date_trunc() to get matching timestamps (very slightly faster).
I would just use aggregation and cumulative sums -- assuming you have at least one per day:
select date_trunc('day', created_at)::date as created_date,
       sum(active::int) as actives,
       sum(sum(active::int)) over (date_trunc('day', created_at)) as running_actives
from t
group by created_date;
You only need to generate the dates if you have holes in the data.  If you do, though, I would recommend including where active -- you can include it now, I just want to be sure there are no holes.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With