Is there a way to group by a unique (primary) key, essentially giving an implicit guarantee that the other columns from that table will be well-defined?
SELECT myPrimaryKey, otherThing
FROM myTable
GROUP BY myPrimaryKey
I know that I can add the other columns to the statement (GROUP BY myPrimaryKey,otherThing
), but I'm trying to avoid that. If you're curious why, read on:
I have a statement which is essentially doing this:
SELECT nodes.node_id, nodes.node_label, COUNT(1)
FROM {a couple of joined tables}
INNER JOIN nodes USING (node_id)
GROUP BY nodes.node_id, nodes.node_label
which works fine, but is a bit slow in MySQL. If I remove nodes.node_label
from the GROUP BY
, it runs about 10x faster (according to EXPLAIN
, this is because one of the earlier joins starts using an index when previously it didn't).
We're in the process of migrating to Postgres, so all new statements are supposed to be compatible with both MySQL and Postgres when possible. Now in Postgres, the original statement runs fast, but the new statement (with the reduced group by) won't run (because Postgres is stricter). In this case, it's a false error because the statement is actually well-defined.
Is there a syntax I can use which will let the same statement run in both platforms, while letting MySQL use just one column in the group by for speed?
Essentially this means grouping by the primary key of a table results in no change in rows to that table, therefore if we group by the primary key of a table, we can call on all columns of that table with no aggregate function.
You can use a SELECT command with a GROUP BY clause to group all rows that have identical values in a specified column or combination of columns, into a single row.
You can not select aggregates across a field if you don't include the field in the group by list.
Answer. No, you can GROUP BY a column that was not included in the SELECT statement. For example, this query does not list the price column in the SELECT , but it does group the data by that column.
In more recent versions of MySql you might have sql_mode=only_full_group_by
enabled which doesn't allow to select non-aggregated columns when using group by
i.e. it forces you to use a function like max()
or avg()
or group_concat()
, sometimes you just want any value.
This flag is enabled by default in MySql 5.7.
The function any_value()
is available when that flag is enabled.
You can achieve the same effect without disabling ONLY_FULL_GROUP_BY by using ANY_VALUE() to refer to the nonaggregated column.
select t.index, any_value(t.insert_date)
from my_table t
group by t.index;
More information here: https://dev.mysql.com/doc/refman/5.7/en/sql-mode.html#sqlmode_only_full_group_by and here: https://dev.mysql.com/doc/refman/5.7/en/group-by-handling.html
In Postgres (not in MySQL, though), you could use DISTINCT ON
to pick a single, consistent row per value (or group of values) without aggregating them:
SELECT DISTINCT ON (n.node_id)
* -- select any or all columns of all joined tables
FROM {a couple of joined tables}
JOIN nodes n USING (node_id)
That gives you a single, arbitrary row for each node_id
. to pick a specific row, add:
ORDER BY n.node_id, ... -- what to sort first?
.. add more ORDER BY
items to pick a specific row. Details:
Select first row in each GROUP BY group?
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With