I have a MySQL table with the fields id
and string
. id
s are unique. string
s are varchars and are non-unique.
I perform the following query:
SELECT id, string, COUNT( * ) AS frequency
FROM table
GROUP BY string
ORDER BY frequency DESC, id ASC
Questions
Assume the table contains three rows with identical string
values, and id
s 1, 2, and 3.
id
is going to be returned ( 1, 2, or 3 )?id
is this query going to ORDER BY
( Same as is returned? ... see question 1 )?id
is returned / used for ordering? eg. Return the largest id
, or the first id
from a GROUP.What I'm ultimately trying to do is get a frequency occurrence for identical strings, order by that frequency, highest to lowest, and on a frequency tie, order by id
with the smallest id
from the group returned / ordered by. I made the situation more generic to figure out how MySQL handles this situation.
GROUP BY returns a single row for each unique combination of the GROUP BY fields. So in your example, every distinct combination of (a1, a2) occurring in rows of Tab1 results in a row in the query representing the group of rows with the given combination of group by field values .
The GROUPING function is used to distinguish between a NULL representing the set of all values in a super-aggregate row (produced by a ROLLUP operation) from a NULL in a regular row.
The MYSQL GROUP BY Clause is used to collect data from multiple records and group the result by one or more column. It is generally used in a SELECT statement. You can also use some aggregate functions like COUNT, SUM, MIN, MAX, AVG etc. on the grouped column.
The GROUP BY clause restricts the rows of the result; only one row appears for each distinct value in the grouping column or columns.
Which id is going to be returned ( 1, 2, or 3 )?
A: The server will choose for all the records that have the same name the id it wants (most likely the fastest to fetch, which is unpredictable). To cite the official documentation:
The server is free to choose any value from each group, so unless they are the same, the values chosen are indeterminate.
Much more information in this link.
Which id is this query going to ORDER BY ( Same as is returned? ... see question 1 )?
It makes no sense to find out in what order the data retrieved will be returned as you can't predict the result you are going to get. However, it is very likely that you get the result sorted by the unpredictable ID column.
Can you control which id is returned / used for ordering? eg. Return the largest id, or the first id from a GROUP.
You should be assuming at this point that you can't. Read again the documentation.
Making things even more clear: You can't predict the result of an improperly used GROUP BY clause. The main issue with MySQL is that it allows you to use it in a non-standard way but you need to know how to make use of that feature. The main point behind it is to group by fields that you know will always be the same. EG:
SELECT id, name, COUNT( * ) AS frequency
FROM table
GROUP BY id
Here, you know name
will be unique as id
functionally determines name
. So the result you know is valid. If you grouped also by name this query would be more standard but will perform slightly worse in MySQL.
As a final note, take into account that, in my experience the results in those non-standard queries for the selected and non-grouped fields are usually the ones that you would get applying a GROUP BY
and then an ORDER BY
on that field. That is why so many times it seems to work. However, if you keep testing you will eventually find out that this happens 95% of the time. And you can not rely on that number.
The documentation says that when not grouping by all non-aggregate columns, one row for each unique combination if the grouped by columns is returned. The row selected is up to the server - ie "random"
However, in practice it is the first row encountered during processing. You can control which is encountered first by selecting from an inner query that is ordered in the order of preference of return.
For example to get the lowest id for each name (yes, undocumented, blah blah, but it works!):
SELECT id, name, COUNT( * ) AS frequency
FROM (select * from table order by id) x
GROUP BY name
ORDER BY frequency DESC, id ASC
I personally am comfortable relying on this behaviour and have never seen or heard of it behaving differently in real life. Many shun this as undocumented and "risky", but if it works, it works.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With