Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Which row's fields are returned when Grouping with MySQL?

I have a MySQL table with the fields id and string. ids are unique. strings are varchars and are non-unique.

I perform the following query:

SELECT id, string, COUNT( * ) AS frequency
FROM table
GROUP BY string
ORDER BY frequency DESC, id ASC

Questions

Assume the table contains three rows with identical string values, and ids 1, 2, and 3.

  1. Which id is going to be returned ( 1, 2, or 3 )?
  2. Which id is this query going to ORDER BY ( Same as is returned? ... see question 1 )?
  3. Can you control which id is returned / used for ordering? eg. Return the largest id, or the first id from a GROUP.

What I'm ultimately trying to do is get a frequency occurrence for identical strings, order by that frequency, highest to lowest, and on a frequency tie, order by id with the smallest id from the group returned / ordered by. I made the situation more generic to figure out how MySQL handles this situation.

like image 302
T. Brian Jones Avatar asked Sep 10 '13 02:09

T. Brian Jones


People also ask

What does GROUP BY return SQL?

GROUP BY returns a single row for each unique combination of the GROUP BY fields. So in your example, every distinct combination of (a1, a2) occurring in rows of Tab1 results in a row in the query representing the group of rows with the given combination of group by field values .

What does the grouping function do MySQL?

The GROUPING function is used to distinguish between a NULL representing the set of all values in a super-aggregate row (produced by a ROLLUP operation) from a NULL in a regular row.

What MySQL clause allows you to group data in a particular column?

The MYSQL GROUP BY Clause is used to collect data from multiple records and group the result by one or more column. It is generally used in a SELECT statement. You can also use some aggregate functions like COUNT, SUM, MIN, MAX, AVG etc. on the grouped column.

Does GROUP BY reduce rows?

The GROUP BY clause restricts the rows of the result; only one row appears for each distinct value in the grouping column or columns.


2 Answers

Which id is going to be returned ( 1, 2, or 3 )?

A: The server will choose for all the records that have the same name the id it wants (most likely the fastest to fetch, which is unpredictable). To cite the official documentation:

The server is free to choose any value from each group, so unless they are the same, the values chosen are indeterminate.

Much more information in this link.

Which id is this query going to ORDER BY ( Same as is returned? ... see question 1 )?

It makes no sense to find out in what order the data retrieved will be returned as you can't predict the result you are going to get. However, it is very likely that you get the result sorted by the unpredictable ID column.

Can you control which id is returned / used for ordering? eg. Return the largest id, or the first id from a GROUP.

You should be assuming at this point that you can't. Read again the documentation.

Making things even more clear: You can't predict the result of an improperly used GROUP BY clause. The main issue with MySQL is that it allows you to use it in a non-standard way but you need to know how to make use of that feature. The main point behind it is to group by fields that you know will always be the same. EG:

SELECT id, name, COUNT( * ) AS frequency
FROM table
GROUP BY id

Here, you know name will be unique as id functionally determines name. So the result you know is valid. If you grouped also by name this query would be more standard but will perform slightly worse in MySQL.

As a final note, take into account that, in my experience the results in those non-standard queries for the selected and non-grouped fields are usually the ones that you would get applying a GROUP BY and then an ORDER BY on that field. That is why so many times it seems to work. However, if you keep testing you will eventually find out that this happens 95% of the time. And you can not rely on that number.

like image 86
Mosty Mostacho Avatar answered Oct 05 '22 08:10

Mosty Mostacho


The documentation says that when not grouping by all non-aggregate columns, one row for each unique combination if the grouped by columns is returned. The row selected is up to the server - ie "random"

However, in practice it is the first row encountered during processing. You can control which is encountered first by selecting from an inner query that is ordered in the order of preference of return.

For example to get the lowest id for each name (yes, undocumented, blah blah, but it works!):

SELECT id, name, COUNT( * ) AS frequency
FROM (select * from table order by id) x
GROUP BY name
ORDER BY frequency DESC, id ASC

I personally am comfortable relying on this behaviour and have never seen or heard of it behaving differently in real life. Many shun this as undocumented and "risky", but if it works, it works.

like image 24
Bohemian Avatar answered Oct 05 '22 07:10

Bohemian