Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Is the GROUP BY clause in SQL redundant?

Tags:

sql

group-by

Whenever we use an aggregate function in SQL (MIN, MAX, AVG etc), we must always GROUP BY all non-aggregated columns, for instance:

SELECT storeid, storename, SUM(revenue), COUNT(*)
FROM Sales 
GROUP BY storeid, storename

It becomes even more intrusive when we use a function or other calculation in our SELECT statement, as this must also be copied to the GROUP BY clause.

SELECT (2 * (x + y)) / z + 1, MyFunction(x, y), SUM(z)
FROM AnotherTable
GROUP BY (2 * (x + y)) / z + 1, MyFunction(x, y)

If we ever change the SELECT statement, we must remember to make the same change to our GROUP BY clause.

So is the GROUP BY clause is redundant?

  • If this is indeed the case, then why is there a GROUP BY clause in SQL at all?
  • If this is not the case, then what extra functionality does GROUP BY give us?
like image 992
Mike Chamberlain Avatar asked Dec 22 '10 01:12

Mike Chamberlain


People also ask

How does the GROUP BY clause work in SQL?

The GROUP BY statement groups rows that have the same values into summary rows, like "find the number of customers in each country". The GROUP BY statement is often used with aggregate functions ( COUNT() , MAX() , MIN() , SUM() , AVG() ) to group the result-set by one or more columns.

Is GROUP BY distinct?

The major difference between the DISTINCT and GROUP BY is, GROUP BY operator is meant for the aggregating or grouping rows whereas DISTINCT is just used to get distinct values.

Does GROUP BY return distinct values?

The use of GROUP BY I understand, The question is based on the fact that it returns a distinct dataset when no aggregate function is present. Because GROUP BY implicitly does a DISTINCT over the values of the column you're grouping by (sorry for the cacophony).

Is GROUP BY and ORDER BY same?

Key Differences between GROUP BY and ORDER BYThe Group By clause is used to group data based on the same value in a specific column. The ORDER BY clause, on the other hand, sorts the result and shows it in ascending or descending order.


2 Answers

Whenever we use an aggregate function in SQL (MIN, MAX, AVG etc), we must always GROUP BY all non-aggregated columns

This is not true in general. MySQL for example doesn't require this, and the SQL standard doesn't say this either.

  • Debunking GROUP BY myths

It becomes even more intrusive when we use a function or other calculation in our SELECT statement, as this must also be copied to the GROUP BY clause.

Also not true in general. MySQL (and perhaps other databases too) allow column aliases to be used in the GROUP BY clause:

SELECT (2 * (x + y)) / z + 1 AS a, MyFunction(x, y) AS b, SUM(z)
FROM AnotherTable
GROUP BY a, b

If this is not the case, then what extra functionality does GROUP BY give us?

The only way of specifying what to group by is to use a GROUP BY clause. You cannot necessarily deduce it from the columns mentioned in the SELECT. In fact you don't even have to select all the columns mentioned in the GROUP BY:

SELECT MAX(col2)
FROM foo
GROUP BY col1
HAVING COUNT(*) = 2
like image 123
Mark Byers Avatar answered Sep 28 '22 02:09

Mark Byers


I may agree with what you're saying, but it is not redundant in all cases.

Consider this:

SELECT FirstName 
       + ' (' + REPLACE(Address1, ',', ' ') + ' '
       + REPLACE(Address2, ',', ' ') + ', '
       + UPPER(State) + ' '
       + 'USA)',
       COUNT(*)
FROM Profiles
GROUP BY FirstName, Address1, Address2, State

In this case I just want the number of same-first-name, same-address profiles.
As you can see, I didn't have to repeat the "complex" operations of the SELECT in the GROUP BY statement.

I think to allow this "sometimes like this, sometimes like that", you are taxed with having to do repetitions most of the time.

like image 28
BeemerGuy Avatar answered Sep 28 '22 01:09

BeemerGuy