Whenever we use an aggregate function in SQL (MIN
, MAX
, AVG
etc), we must always GROUP BY
all non-aggregated columns, for instance:
SELECT storeid, storename, SUM(revenue), COUNT(*)
FROM Sales
GROUP BY storeid, storename
It becomes even more intrusive when we use a function or other calculation in our SELECT statement, as this must also be copied to the GROUP BY clause.
SELECT (2 * (x + y)) / z + 1, MyFunction(x, y), SUM(z)
FROM AnotherTable
GROUP BY (2 * (x + y)) / z + 1, MyFunction(x, y)
If we ever change the SELECT statement, we must remember to make the same change to our GROUP BY clause.
So is the GROUP BY clause is redundant?
The GROUP BY statement groups rows that have the same values into summary rows, like "find the number of customers in each country". The GROUP BY statement is often used with aggregate functions ( COUNT() , MAX() , MIN() , SUM() , AVG() ) to group the result-set by one or more columns.
The major difference between the DISTINCT and GROUP BY is, GROUP BY operator is meant for the aggregating or grouping rows whereas DISTINCT is just used to get distinct values.
The use of GROUP BY I understand, The question is based on the fact that it returns a distinct dataset when no aggregate function is present. Because GROUP BY implicitly does a DISTINCT over the values of the column you're grouping by (sorry for the cacophony).
Key Differences between GROUP BY and ORDER BYThe Group By clause is used to group data based on the same value in a specific column. The ORDER BY clause, on the other hand, sorts the result and shows it in ascending or descending order.
Whenever we use an aggregate function in SQL (MIN, MAX, AVG etc), we must always GROUP BY all non-aggregated columns
This is not true in general. MySQL for example doesn't require this, and the SQL standard doesn't say this either.
It becomes even more intrusive when we use a function or other calculation in our SELECT statement, as this must also be copied to the GROUP BY clause.
Also not true in general. MySQL (and perhaps other databases too) allow column aliases to be used in the GROUP BY clause:
SELECT (2 * (x + y)) / z + 1 AS a, MyFunction(x, y) AS b, SUM(z)
FROM AnotherTable
GROUP BY a, b
If this is not the case, then what extra functionality does GROUP BY give us?
The only way of specifying what to group by is to use a GROUP BY clause. You cannot necessarily deduce it from the columns mentioned in the SELECT. In fact you don't even have to select all the columns mentioned in the GROUP BY:
SELECT MAX(col2)
FROM foo
GROUP BY col1
HAVING COUNT(*) = 2
I may agree with what you're saying, but it is not redundant in all cases.
Consider this:
SELECT FirstName
+ ' (' + REPLACE(Address1, ',', ' ') + ' '
+ REPLACE(Address2, ',', ' ') + ', '
+ UPPER(State) + ' '
+ 'USA)',
COUNT(*)
FROM Profiles
GROUP BY FirstName, Address1, Address2, State
In this case I just want the number of same-first-name, same-address profiles.
As you can see, I didn't have to repeat the "complex" operations of the SELECT
in the GROUP BY
statement.
I think to allow this "sometimes like this, sometimes like that", you are taxed with having to do repetitions most of the time.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With