This has bugged me for a long time.
99% of the time, the GROUP BY clause is an exact copy of the SELECT clause, minus the aggregate functions (MAX, SUM, etc.).
This breaks the Don't Repeat Yourself principle.
When can the GROUP BY clause not contain an exact copy of the SELECT clause minus the aggregate functions?
I realise that some implementations allow you to have different fields in the GROUP BY than in the SELECT (hence 99%, not 100%), but surely that's a very minor exception?
Can someone explain what is supposed to be returned if you use different fields?
Thanks.
If you specify the GROUP BY clause, columns referenced must be all the columns in the SELECT clause that do not contain an aggregate function. These columns can either be the column, an expression, or the ordinal number in the column list.
You can use the GROUP BY clause without applying an aggregate function. The following query gets data from the payment table and groups the result by customer id. In this case, the GROUP BY works like the DISTINCT clause that removes duplicate rows from the result set.
If the ONLY_FULL_GROUP_BY SQL mode is enabled (which it is by default), MySQL rejects queries for which the select list, HAVING condition, or ORDER BY list refer to nonaggregated columns that are neither named in the GROUP BY clause nor are functionally dependent on them.
We cannot use the WHERE clause with aggregate functions because it works for filtering individual rows. In contrast, HAVING can works with aggregate functions because it is used to filter groups.
I tend to agree with you - this is one of many cases where SQL should have slightly smarter defaults to save us all some typing. For example, imagine if this were legal:
Select ClientName, InvoiceAmount, Sum(PaymentAmount) Group By *
where "*" meant "all the non-aggregate fields". If everybody knew that's how it worked, then there would be no confusion. You could sub in a specific list of fields if you wanted to do something tricky, but the splat means "all of 'em" (which in this context means, all the possible ones).
Granted, "*" means something different here than in the SELECT clause, so maybe a different character would work better:
Select ClientName, InvoiceAmount, Sum(PaymentAmount) Group By !
There are a few other areas like that where SQL just isn't as eloquent as it could be. But at this point, it's probably too entrenched to make many big changes like that.
Because they are two different things, you can group by items that aren't in the select clause
EDIT:
Also, is it safe to make that assumption?
I have a SQL statement
Select ClientName, InvAmt, Sum(PayAmt) as PayTot
Is it "correct" for the server to assume I want to group by ClientName AND InvoiceAmount? I personally prefer (and think it's safer) to have this code
Select ClientName, InvAmt, Sum(PayAmt) as PayTot Group By ClientName
throw an error, prompting me to change the code to
Select ClientName, Sum(InvAmt) as InvTot, Sum(PayAmt) as PayTot Group By ClientName
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With