Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

MySQL - Selecting a Column not in Group By

Tags:

mysql

group-by

People also ask

Can I select column not in GROUP BY?

The direct answer is that you can't. You must select either an aggregate or something that you are grouping by.

What happens if I GROUP BY a column that is not in the select statement Why does this happen?

No, you can GROUP BY a column that was not included in the SELECT statement. For example, this query does not list the price column in the SELECT , but it does group the data by that column.

How do I select a single column in MySQL?

If you want to select only specific columns, replace the * with the names of the columns, separated by commas. The following statement selects just the name_id, firstname and lastname fields from the master_name table.

What is non aggregated column MySQL?

If the ONLY_FULL_GROUP_BY SQL mode is enabled (which it is by default), MySQL rejects queries for which the select list, HAVING condition, or ORDER BY list refer to nonaggregated columns that are neither named in the GROUP BY clause nor are functionally dependent on them.


It's true that this feature permits some ambiguous queries, and silently returns a result set with an arbitrary value picked from that column. In practice, it tends to be the value from the row within the group that is physically stored first.

These queries aren't ambiguous if you only choose columns that are functionally dependent on the column(s) in the GROUP BY criteria. In other words, if there can be only one distinct value of the "ambiguous" column per value that defines the group, there's no problem. This query would be illegal in Microsoft SQL Server (and ANSI SQL), even though it cannot logically result in ambiguity:

SELECT AVG(table1.col1), table1.personID, persons.col4
FROM table1 JOIN persons ON (table1.personID = persons.id)
GROUP BY table1.personID;

Also, MySQL has an SQL mode to make it behave per the standard: ONLY_FULL_GROUP_BY

FWIW, SQLite also permits these ambiguous GROUP BY clauses, but it chooses the value from the last row in the group.


At least in the version I tested. What it means to be arbitrary is that either MySQL or SQLite could change their implementation in the future, and have some different behavior. You should therefore not rely on the behavior staying they way it is currently in ambiguous cases like this. It's better to rewrite your queries to be deterministic and not ambiguous. That's why MySQL 5.7 now enables ONLY_FULL_GROUP_BY by default.


I should have Googled for just a bit longer... It seems I found my answer.

MySQL extends the use of GROUP BY so that you can use nonaggregated columns or calculations in the SELECT list that do not appear in the GROUP BY clause. You can use this feature to get better performance by avoiding unnecessary column sorting and grouping. For example, you do not need to group on customer.name in the following query

In standard SQL, you would have to add customer.name to the GROUP BY clause. In MySQL, the name is redundant.

Still, that just seems... wrong.


Let's say you have a query like this:

SELECT g, v 
FROM t
GROUP BY g;

In this case, for each possible value for g, mysql picks one of the corresponding values of v.

However, which one is chosen, depends on some circumstances.

I read somewhere that for each group of g, the first value of v is kept, in the order how the records were inserted into the table t.

This is quite ugly because the records in a table should be treated as a set where the order of the elements should not matter. This is so "mysql-ish"...

If you want to determine which value for v to keep, you need to apply a subselect for t like this:

SELECT g, v 
FROM (
    SELECT * 
        FROM t 
        ORDER BY g, v DESC
) q
GROUP BY g;

This way you define which order the records of the subquery are processed by the external query, thus you can trust which value of v it will pick for the individual values of g.

However, if you need some WHERE conditions then be very careful. If you add the WHERE condition to the subquery then it will keep the behaviour, it will always return the value you expect:

SELECT g, v 
FROM (
    SELECT * 
        FROM t 
        WHERE g = '737a8783-110c-447e-b4c2-1cbb7c6b72c9' 
        ORDER BY g, v DESC
) q
GROUP BY g;

This is what you expect, the subselect filters and orders the table. It keeps the records where g has the given value and the external query returns that g and the first value for v.

However, if you add the same WHERE condition to the outer query then you get a non-deterministic result:

SELECT g, v 
FROM (
    SELECT * 
        FROM t 
        -- WHERE g = '737a8783-110c-447e-b4c2-1cbb7c6b72c9' 
        ORDER BY g, v DESC
) q
WHERE g = '737a8783-110c-447e-b4c2-1cbb7c6b72c9'
GROUP BY g;

Surprisingly, you may get different values for v when executing the same query again and again which is... strange. The expected behaviour is to get all the records in the appropriate order from the subquery, filtering them in the outer query and then picking the same as it picked in the previous example. But it does not.

It picks a value for v seemingly randomly. The same query returned different values for v if I executed more (~20) times but the distribution was not uniform.

If instead of adding an outer WHERE, you specify a HAVING condition like this:

SELECT g, v 
FROM (
    SELECT * 
        FROM t1 
        -- WHERE g = '737a8783-110c-447e-b4c2-1cbb7c6b72c9' 
        ORDER BY g, v DESC
) q
-- WHERE g = '737a8783-110c-447e-b4c2-1cbb7c6b72c9'
GROUP BY g
HAVING g = '737a8783-110c-447e-b4c2-1cbb7c6b72c9';

Then you get a consistent behaviour again.

CONCLUSION: I would suggest not to rely on this technique at all. If you really want/need to then avoid WHERE conditions in the outer query. Use it in the inner query if you can or a HAVING clause in the outer query.

I tested it with this data:

CREATE TABLE t1 (
    v INT,
    g VARCHAR(36)
);

INSERT INTO t1 VALUES (1, '737a8783-110c-447e-b4c2-1cbb7c6b72c9');
INSERT INTO t1 VALUES (2, '737a8783-110c-447e-b4c2-1cbb7c6b72c9');

in mysql 5.6.41.

Maybe it is just a bug that gets/got fixed in newer versions, please give feedback if you have experience with newer versions.