Surprise -- this is a perfectly valid query in MySQL:
select X, Y from someTable group by X
If you tried this query in Oracle or SQL Server, you’d get the natural error message:
Column 'Y' is invalid in the select list because it is not contained in either an aggregate function or the GROUP BY clause.
So how does MySQL determine which Y to show for each X? It just picks one. From what I can tell, it just picks the first Y it finds. The rationale being, if Y is neither an aggregate function nor in the group by clause, then specifying “select Y” in your query makes no sense to begin with. Therefore, I as the database engine will return whatever I want, and you’ll like it.
There’s even a MySQL configuration parameter to turn off this “looseness”. http://dev.mysql.com/doc/refman/5.7/en/sql-mode.html#sqlmode_only_full_group_by
This article even mentions how MySQL has been criticized for being ANSI-SQL non-compliant in this regard. http://www.oreillynet.com/databases/blog/2007/05/debunking_group_by_myths.html
My question is: Why was MySQL designed this way? What was their rationale for breaking with ANSI-SQL?
Do not use the GROUP BY clause without an aggregate function. The rule is deprecated in the new SQL Server versions. The execution plan generated by using the GROUP BY is not different than the one of using DISTINCT.
You can use the GROUP BY clause without applying an aggregate function.
Expressions that are not encapsulated within an aggregate function and must be included in the GROUP BY Clause at the end of the SQL statement. This is an aggregate function such as the SUM, COUNT, MIN, MAX, or AVG functions.
Guidelines of using GROUP BY clause (3) GROUP BY clause can only be used with aggregate functions like SUM, AVG, COUNT, MAX, and MIN. If it is used with single row functions,Oracle throws an exception as "ORA-00979: not a GROUP BY expression". (4) Aggregate functions cannot be used in a GROUP BY clause.
According to this page (the 5.0 online manual), it's for better performance and user convenience.
I believe that it was to handle the case where grouping by one field would imply other fields are also being grouped:
SELECT user.id, user.name, COUNT(post.*) AS posts FROM user LEFT OUTER JOIN post ON post.owner_id=user.id GROUP BY user.id
In this case the user.name will always be unique per user.id, so there is convenience in not requiring the user.name in the GROUP BY
clause (although, as you say, there is definite scope for problems)
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With