Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Why does SQL force me to repeat all non-aggregated fields from my SELECT clause in my GROUP BY clause? [closed]

Tags:

sql

group-by

This has bugged me for a long time.

99% of the time, the GROUP BY clause is an exact copy of the SELECT clause, minus the aggregate functions (MAX, SUM, etc.).
This breaks the Don't Repeat Yourself principle.

When can the GROUP BY clause not contain an exact copy of the SELECT clause minus the aggregate functions?

edit

I realise that some implementations allow you to have different fields in the GROUP BY than in the SELECT (hence 99%, not 100%), but surely that's a very minor exception?
Can someone explain what is supposed to be returned if you use different fields?

Thanks.

like image 542
AJ. Avatar asked Jan 06 '09 13:01

AJ.


People also ask

When using a GROUP BY clause all of the columns being returned must either be included in the group or part of what?

If you specify the GROUP BY clause, columns referenced must be all the columns in the SELECT clause that do not contain an aggregate function. These columns can either be the column, an expression, or the ordinal number in the column list.

Can we use GROUP BY without any aggregate expression in SELECT clause?

You can use the GROUP BY clause without applying an aggregate function. The following query gets data from the payment table and groups the result by customer id. In this case, the GROUP BY works like the DISTINCT clause that removes duplicate rows from the result set.

What is non aggregated column MySQL?

If the ONLY_FULL_GROUP_BY SQL mode is enabled (which it is by default), MySQL rejects queries for which the select list, HAVING condition, or ORDER BY list refer to nonaggregated columns that are neither named in the GROUP BY clause nor are functionally dependent on them.

Why we Cannot use WHERE clause with aggregate functions like HAVING?

We cannot use the WHERE clause with aggregate functions because it works for filtering individual rows. In contrast, HAVING can works with aggregate functions because it is used to filter groups.


2 Answers

I tend to agree with you - this is one of many cases where SQL should have slightly smarter defaults to save us all some typing. For example, imagine if this were legal:

Select ClientName, InvoiceAmount, Sum(PaymentAmount) Group By * 

where "*" meant "all the non-aggregate fields". If everybody knew that's how it worked, then there would be no confusion. You could sub in a specific list of fields if you wanted to do something tricky, but the splat means "all of 'em" (which in this context means, all the possible ones).

Granted, "*" means something different here than in the SELECT clause, so maybe a different character would work better:

Select ClientName, InvoiceAmount, Sum(PaymentAmount) Group By ! 

There are a few other areas like that where SQL just isn't as eloquent as it could be. But at this point, it's probably too entrenched to make many big changes like that.

like image 61
Ian Varley Avatar answered Oct 10 '22 06:10

Ian Varley


Because they are two different things, you can group by items that aren't in the select clause

EDIT:

Also, is it safe to make that assumption?

I have a SQL statement

Select ClientName, InvAmt, Sum(PayAmt) as PayTot 

Is it "correct" for the server to assume I want to group by ClientName AND InvoiceAmount? I personally prefer (and think it's safer) to have this code

Select ClientName, InvAmt, Sum(PayAmt) as PayTot Group By ClientName 

throw an error, prompting me to change the code to

Select ClientName, Sum(InvAmt) as InvTot, Sum(PayAmt) as PayTot Group By ClientName 
like image 31
Binary Worrier Avatar answered Oct 10 '22 06:10

Binary Worrier