Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Why do you need to include a field in GROUP BY when using OVER (PARTITION BY x)?

I have a table for which I want to do a simple sum of a field, grouped by two columns. I then want the total for all values for each year_num.

See example: http://rextester.com/QSLRS68794

This query is throwing: "42803: column "foo.num_cust" must appear in the GROUP BY clause or be used in an aggregate function", and I cannot figure out why. Why would an aggregate function using the OVER (PARTITION BY x) require the summed field to be in GROUP BY??

select 
    year_num
    ,age_bucket
    ,sum(num_cust)
    --,sum(num_cust) over (partition by year_num)  --THROWS ERROR!!
from
    foo
group by
    year_num
    ,age_bucket
order by 1,2

TABLE:

| loc_id |  year_num |  gen |  cust_category |  cust_age |  num_cust |  age_bucket |
|--------|-----------|------|----------------|-----------|-----------|-------------|
| 1      | 2016      | M    | cash           | 41        | 2         | 04_<45      |
| 1      | 2016      | F    | Prepaid        | 41        | 1         | 03_<35      |
| 1      | 2016      | F    | cc             | 61        | 1         | 05_45+      |
| 1      | 2016      | F    | cc             | 19        | 2         | 02_<25      |
| 1      | 2016      | M    | cc             | 64        | 1         | 05_45+      |
| 1      | 2016      | F    | cash           | 46        | 1         | 05_45+      |
| 1      | 2016      | F    | cash           | 27        | 3         | 03_<35      |
| 1      | 2016      | M    | cash           | 42        | 1         | 04_<45      |
| 1      | 2017      | F    | cc             | 35        | 1         | 04_<45      |
| 1      | 2017      | F    | cc             | 37        | 1         | 04_<45      |
| 1      | 2017      | F    | cash           | 46        | 1         | 05_45+      |
| 1      | 2016      | F    | cash           | 19        | 4         | 02_<25      |
| 1      | 2017      | M    | cash           | 43        | 1         | 04_<45      |
| 1      | 2017      | M    | cash           | 29        | 1         | 03_<35      |
| 1      | 2016      | F    | cc             | 13        | 1         | 01_<18      |
| 1      | 2017      | F    | cash           | 16        | 2         | 01_<18      |
| 1      | 2016      | F    | cc             | 17        | 2         | 01_<18      |
| 1      | 2016      | M    | cc             | 17        | 2         | 01_<18      |
| 1      | 2017      | F    | cash           | 18        | 9         | 02_<25      |

DESIRED OUTPUT:

| year_num | age_bucket | sum | sum over (year_num) |
|----------|------------|-----|---------------------|
| 2016     | 01_<18     | 5   | 21                  |
| 2016     | 02_<25     | 6   | 21                  |
| 2016     | 03_<35     | 4   | 21                  |
| 2016     | 04_<45     | 3   | 21                  |
| 2016     | 05_45+     | 3   | 21                  |
| 2017     | 01_<18     | 2   | 16                  |
| 2017     | 02_<25     | 9   | 16                  |
| 2017     | 03_<35     | 1   | 16                  |
| 2017     | 04_<45     | 3   | 16                  |
| 2017     | 05_45+     | 1   | 16                  |
like image 305
psrpsrpsr Avatar asked Oct 06 '17 19:10

psrpsrpsr


1 Answers

You need to nest the sum()s:

select year_num, age_bucket, sum(num_cust),
       sum(sum(num_cust)) over (partition by year_num)  --WORKS!!
from foo
group by year_num, age_bucket
order by 1, 2;

Why? Well, the window function is not doing aggregation. The argument needs to be an expression that can be evaluated after the group by (because this is an aggregation query). Because num_cust is not a group by key, it needs an aggregation function.

Perhaps this is clearer if you used a subquery:

select year_num, age_bucket, sum_num_cust,
       sum(sum_num_cust) over (partition by year_num)
from (select year_num, age_bucket, sum(num_cust) as sum_num_cust
      from foo
      group by year_num, age_bucket
     ) ya
order by 1, 2;

These two queries do exactly the same thing. But with the subquery it should be more obvious why you need the extra aggregation.

like image 152
Gordon Linoff Avatar answered Sep 20 '22 14:09

Gordon Linoff