Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

GROUP BY behavior when no aggregate functions are present in the SELECT clause

Tags:

sql

mysql

I have a table emp with following structure and data:

name   dept    salary -----  -----   ----- Jack   a       2 Jill   a       1 Tom    b       2 Fred   b       1 

When I execute the following SQL:

SELECT * FROM emp GROUP BY dept 

I get the following result:

name   dept    salary -----  -----   ----- Jill   a       1 Fred   b       1 

On what basis did the server decide return Jill and Fred and exclude Jack and Tom?

I am running this query in MySQL.

Note 1: I know the query doesn't make sense on its own. I am trying to debug a problem with a 'GROUP BY' scenario. I am trying to understand the default behavior for this purpose.

Note 2: I am used to writing the SELECT clause same as the GROUP BY clause (minus the aggregate fields). When I came across the behavior described above, I started wondering if I can rely on this for scenarios such as: select the rows from emp table where the salary is the lowest/highest in the dept. E.g.: The SQL statements like this works on MySQL:

SELECT A.*, MIN(A.salary) AS min_salary FROM emp AS A GROUP BY A.dept 

I didn't find any material describing why such SQL works, more importantly if I can rely on such behavior consistently. If this is a reliable behavior then I can avoid queries like:

SELECT A.* FROM emp AS A WHERE A.salary = (              SELECT MAX(B.salary) FROM emp B WHERE B.dept = A.dept) 
like image 931
Harish Shetty Avatar asked Oct 20 '09 00:10

Harish Shetty


People also ask

Can we use GROUP BY without any aggregate expression in SELECT clause?

You can use the GROUP BY clause without applying an aggregate function. The following query gets data from the payment table and groups the result by customer id. In this case, the GROUP BY works like the DISTINCT clause that removes duplicate rows from the result set.

Can you use GROUP BY without aggregation?

GROUP BY without Aggregate Functions Although most of the times GROUP BY is used along with aggregate functions, it can still still used without aggregate functions — to find unique records.

Is aggregate function mandatory for GROUP BY clause?

Expressions that are not encapsulated within an aggregate function and must be included in the GROUP BY Clause at the end of the SQL statement. This is an aggregate function such as the SUM, COUNT, MIN, MAX, or AVG functions. This is the column or expression that the aggregate_function will be used on.

Can we use GROUP BY clause without aggregate in Oracle?

Oracle applies the aggregate functions to each group of rows and returns a single result row for each group. If you omit the GROUP BY clause, then Oracle applies aggregate functions in the select list to all the rows in the queried table or view.


2 Answers

Read MySQL documentation on this particular point.

In a nutshell, MySQL allows omitting some columns from the GROUP BY, for performance purposes, however this works only if the omitted columns all have the same value (within a grouping), otherwise, the value returned by the query are indeed indeterminate, as properly guessed by others in this post. To be sure adding an ORDER BY clause would not re-introduce any form of deterministic behavior.

Although not at the core of the issue, this example shows how using * rather than an explicit enumeration of desired columns is often a bad idea.

Excerpt from MySQL 5.0 documentation:

 When using this feature, all rows in each group should have the same values for the columns that are omitted from the GROUP BY part. The server is free to return any value from the group, so the results are indeterminate unless all values are the same.  
like image 92
mjv Avatar answered Oct 13 '22 23:10

mjv


This is a bit late, but I'll put this up for future reference.

The GROUP BY takes the first row that has a duplicate and discards any rows that match after it in the result set. So if Jack and Tom have the same department, whoever appears first in a normal SELECT will be the resulting row in the GROUP BY.

If you want to control what appears first in the list, you need to do an ORDER BY. However, SQL does not allow ORDER BY to come before GROUP BY, as it will throw an exception. The best workaround for this issue is to do the ORDER BY in a subquery and then a GROUP BY in the outer query. Here's an example:

SELECT * FROM (SELECT * FROM emp ORDER BY name) as foo GROUP BY dept 

This is the best performing technique I've found. I hope this helps someone out.

like image 23
Samuel Hodge Avatar answered Oct 14 '22 01:10

Samuel Hodge