Currently I have this rather large query that works by
count()
of an event grouped by the event name and the date.avg()
group by just event, doing a union of the results, and because I want to have a separate column for daily, weekly, monthly, putting a filler value of 0 into empty columns.The query is pretty large though, and I feel like I'm doing a lot of repetitive work. Is there any way to do this query better or make it smaller? I haven't really done queries like this before so I'm not quite sure.
WITH monthly_counts as (
SELECT
event,
count(*) as count
FROM tracking_stuff
WHERE
event = 'thing'
OR event = 'thing2'
OR event = 'thing3'
GROUP BY event, date_trunc('month', created_at)
),
weekly_counts as (
SELECT
event,
count(*) as count
FROM tracking_stuff
WHERE
event = 'thing'
OR event = 'thing2'
OR event = 'thing3'
GROUP BY event, date_trunc('week', created_at)
),
daily_counts as (
SELECT
event,
count(*) as count
FROM tracking_stuff
WHERE
event = 'thing'
OR event = 'thing2'
OR event = 'thing3'
GROUP BY event, date_trunc('day', created_at)
),
query as (
SELECT
event,
0 as daily_avg,
0 as weekly_avg,
avg(count) as monthly_avg
FROM monthly_counts
GROUP BY event
UNION
SELECT
event,
0 as daily_avg,
avg(count) as weekly_avg,
0 as monthly_avg
FROM weekly_counts
GROUP BY event
UNION
SELECT
event,
avg(count) as daily_avg,
0 as weekly_avg,
0 as monthly_avg
FROM daily_counts
GROUP BY event
)
SELECT
event,
sum(daily_avg) as daily_avg,
sum(weekly_avg) as weekly_avg,
sum(monthly_avg) as monthly_avg
FROM query
GROUP BY event;
I'd write the query in a way like this:
select event, daily_avg, weekly_avg, monthly_avg
from (
select event, avg(count) monthly_avg
from (
select event, count(*)
from tracking_stuff
where event in ('thing1', 'thing2', 'thing3')
group by event, date_trunc('month', created_at)
) s
group by 1
) monthly
join (
select event, avg(count) weekly_avg
from (
select event, count(*)
from tracking_stuff
where event in ('thing1', 'thing2', 'thing3')
group by event, date_trunc('week', created_at)
) s
group by 1
) weekly using(event)
join (
select event, avg(count) daily_avg
from (
select event, count(*)
from tracking_stuff
where event in ('thing1', 'thing2', 'thing3')
group by event, date_trunc('day', created_at)
) s
group by 1
) daily using(event)
order by 1;
If the where
condition eliminates a significant portion of the data (say more than a half) the use of cte
could slightly speed up the query execution:
with the_data as (
select event, created_at
from tracking_stuff
where event in ('thing1', 'thing2', 'thing3')
)
select event, daily_avg, weekly_avg, monthly_avg
from (
select event, avg(count) monthly_avg
from (
select event, count(*)
from the_data
group by event, date_trunc('month', created_at)
) s
group by 1
) monthly
-- etc ...
Just for curiosity I've done a test on the data:
create table tracking_stuff (event text, created_at timestamp);
insert into tracking_stuff
select 'thing' || random_int(9), '2016-01-01'::date+ random_int(365)
from generate_series(1, 1000000);
In every query I've replaced thing
with thing1
, so the queries eliminate about 2/3 of rows.
Average execution time of 10 tests:
Original query 1106 ms
My query without cte 1077 ms
My query with cte 902 ms
Clodoaldo's query 5187 ms
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With