I have a table for which I want to do a simple sum of a field, grouped by two columns. I then want the total for all values for each year_num.
See example: http://rextester.com/QSLRS68794
This query is throwing: "42803: column "foo.num_cust" must appear in the GROUP BY clause or be used in an aggregate function", and I cannot figure out why. Why would an aggregate function using the OVER (PARTITION BY x) require the summed field to be in GROUP BY??
select
year_num
,age_bucket
,sum(num_cust)
--,sum(num_cust) over (partition by year_num) --THROWS ERROR!!
from
foo
group by
year_num
,age_bucket
order by 1,2
TABLE:
| loc_id | year_num | gen | cust_category | cust_age | num_cust | age_bucket |
|--------|-----------|------|----------------|-----------|-----------|-------------|
| 1 | 2016 | M | cash | 41 | 2 | 04_<45 |
| 1 | 2016 | F | Prepaid | 41 | 1 | 03_<35 |
| 1 | 2016 | F | cc | 61 | 1 | 05_45+ |
| 1 | 2016 | F | cc | 19 | 2 | 02_<25 |
| 1 | 2016 | M | cc | 64 | 1 | 05_45+ |
| 1 | 2016 | F | cash | 46 | 1 | 05_45+ |
| 1 | 2016 | F | cash | 27 | 3 | 03_<35 |
| 1 | 2016 | M | cash | 42 | 1 | 04_<45 |
| 1 | 2017 | F | cc | 35 | 1 | 04_<45 |
| 1 | 2017 | F | cc | 37 | 1 | 04_<45 |
| 1 | 2017 | F | cash | 46 | 1 | 05_45+ |
| 1 | 2016 | F | cash | 19 | 4 | 02_<25 |
| 1 | 2017 | M | cash | 43 | 1 | 04_<45 |
| 1 | 2017 | M | cash | 29 | 1 | 03_<35 |
| 1 | 2016 | F | cc | 13 | 1 | 01_<18 |
| 1 | 2017 | F | cash | 16 | 2 | 01_<18 |
| 1 | 2016 | F | cc | 17 | 2 | 01_<18 |
| 1 | 2016 | M | cc | 17 | 2 | 01_<18 |
| 1 | 2017 | F | cash | 18 | 9 | 02_<25 |
DESIRED OUTPUT:
| year_num | age_bucket | sum | sum over (year_num) |
|----------|------------|-----|---------------------|
| 2016 | 01_<18 | 5 | 21 |
| 2016 | 02_<25 | 6 | 21 |
| 2016 | 03_<35 | 4 | 21 |
| 2016 | 04_<45 | 3 | 21 |
| 2016 | 05_45+ | 3 | 21 |
| 2017 | 01_<18 | 2 | 16 |
| 2017 | 02_<25 | 9 | 16 |
| 2017 | 03_<35 | 1 | 16 |
| 2017 | 04_<45 | 3 | 16 |
| 2017 | 05_45+ | 1 | 16 |
You need to nest the sum()
s:
select year_num, age_bucket, sum(num_cust),
sum(sum(num_cust)) over (partition by year_num) --WORKS!!
from foo
group by year_num, age_bucket
order by 1, 2;
Why? Well, the window function is not doing aggregation. The argument needs to be an expression that can be evaluated after the group by
(because this is an aggregation query). Because num_cust
is not a group by
key, it needs an aggregation function.
Perhaps this is clearer if you used a subquery:
select year_num, age_bucket, sum_num_cust,
sum(sum_num_cust) over (partition by year_num)
from (select year_num, age_bucket, sum(num_cust) as sum_num_cust
from foo
group by year_num, age_bucket
) ya
order by 1, 2;
These two queries do exactly the same thing. But with the subquery it should be more obvious why you need the extra aggregation.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With