Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

MySQL - Control which row is returned by a group by

I have a database table like this:

id    version_id    field1    field2 1     1             texta      text1 1     2             textb      text2 2     1             textc      text3 2     2             textd      text4 2     3             texte      text5 

If you didn't work it out, it contains a number of versions of a row, and then some text data.

I want to query it and return the version with the highest number for each id. (so the second and last rows only in the above).

I've tried using group by whilst ordering by version_id DESC - but it seems to order after its grouped, so this doesn't work.

Anyone got any ideas? I can't believe it can't be done!

UPDATE:

Come up with this, which works, but uses a subquery:

SELECT * FROM (SELECT * FROM table ORDER BY version_id DESC) t1 GROUP BY t1.id 
like image 695
benlumley Avatar asked Feb 11 '09 15:02

benlumley


People also ask

Can we use WHERE clause with GROUP BY in MySQL?

The GROUP BY clause groups a set of rows into a set of summary rows by values of columns or expressions. The GROUP BY clause returns one row for each group. In other words, it reduces the number of rows in the result set. In this syntax, you place the GROUP BY clause after the FROM and WHERE clauses.

Can we use WHERE after GROUP BY?

In the query, GROUP BY clause is placed after the WHERE clause. In the query, GROUP BY clause is placed before ORDER BY clause if used any.

Which clause is used to return specific rows?

The WHERE clause allows you to retrieve only rows you are interested in. If the expression in the WHERE clause is true for any row, then that row is returned.

Which clause is used to constrain the number of rows returned by the SELECT statement?

Answer: B. The WHERE clause is used to restrict the number of rows returned from a SELECT query.


1 Answers

It's called selecting the group-wise maximum of a column. Here are several different approaches for mysql.

Here's how I would do it:

SELECT * FROM (SELECT id, max(version_id) as version_id FROM table GROUP BY id) t1 INNER JOIN table t2 on t2.id=t1.id and t1.version_id=t2.version_id 

This will be relatively efficient, though mysql will create a temporary table in memory for the subquery. I assume you already have an index on (id, version_id) for this table.

It's a deficiency in SQL that you more or less have to use a subquery for this type of problem (semi-joins are another example).

Subqueries are not well optimized in mysql but uncorrelated subqueries aren't so bad as long as they aren't so enormous that they will get written to disk rather than memory. Given that in this query only has two ints the subquery could be millions of rows long before that happened but the select * subquery in your first query could suffer from this problem much sooner.

like image 65
ʞɔıu Avatar answered Oct 06 '22 01:10

ʞɔıu