I'm trying to calculate the cumulative sum of columns t and s over a date from my sample data below, using Presto SQL.
Date | T | S
1/2/19 | 2 | 5
2/1/19 | 5 | 1
3/1/19 | 1 | 1
I would like to get
Date | T | S | cum_T | cum_S
1/2/19 | 2 | 5 | 2 | 5
2/1/19 | 5 | 1 | 7 | 6
3/1/19 | 1 | 1 | 8 | 7
However when I run the below query using Presto SQL I am receiving an unexpected error message, telling me to put columns T and S into the group by section of my query.
Is this expected? When I remove the group by from my query it runs without error, but produces duplicate date rows. +
select
date_trunc('day',tb1.date),
sum(tb1.S) over (partition by date_trunc('day',tb1.date) order by date_trunc('day',tb1.date) rows unbounded preceding ) as cum_S,
sum(tb1.T) over (partition by date_trunc('day',tb1.date) order by date_trunc('day',tb1.date) rows unbounded preceding) as cum_T
from esi_dpd_bi_esds_prst.points_tb1_use_dedup_18months_vw tb1
where
tb1.reason_id not in (45,264,418,983,990,997,999,1574)
and tb1.group_id not in (22)
and tb1.point_status not in (3)
and tb1.date between cast(DATE '2019-01-01' as date) and cast( DATE '2019-01-03' as date)
group by
1
order by date_trunc('day',tb1.date) desc
Error looks like this:
Error: line 3:1: '"sum"(tb1.S) OVER (PARTITION BY "date_trunc"('day', tb1.tb1) ORDER BY "date_trunc"('day', tb1.tb1) ASC ROWS UNBOUNDED PRECEDING)' must be an aggregate expression or appear in GROUP BY clause.
You have an aggregation query and you want to mix the aggregations with window functions. The correct syntax is:
select date_trunc('day', tb1.date),
sum(tbl1.S) as S,
sum(tbl1.T) as T,
sum(sum(tb1.S)) over (order by date_trunc('day', tb1.date) rows unbounded preceding ) as cum_S,
sum(sum(tb1.T)) over (order by date_trunc('day', tb1.date) rows unbounded preceding) as cum_T
from esi_dpd_bi_esds_prst.points_tb1_use_dedup_18months_vw tb1
where tb1.reason_id not in (45, 264, 418, 983, 990, 997, 999, 1574) and
tb1.group_id not in (22) and
tb1.point_status not in (3) and
tb1.date between cast(DATE '2019-01-01' as date) and cast( DATE '2019-01-03' as date)
group by 1
order by date_trunc('day', tb1.date) desc ;
That is, the window function is running after the aggregation and needs to process the aggregated value.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With