I'm trying to get records that have the highest value in one particular column ("version"). I'm using the base_id to get rows, and there may be more than one row with the same base_id, but they will then have different version numbers. So the point of the statement is to only get the one with the highest version. And the statement below works, but only if there are actually more than one value. If there is only one I get no records at all back (as opposed to the expected one row). So how can I get only the value with the highest version number below, even if for some records only one version exists?:
SELECT r.id
, r.title
, u.name created_by
, m.name modified_by
, r.version
, r.version_displayname
, r.informationtype
, r.filetype
, r.base_id
, r.resource_id
, r.created
, r.modified
, GROUP_CONCAT( CONCAT(CAST(c.id as CHAR),',',c.name,',',c.value) separator ';') categories
FROM resource r
JOIN category_resource cr
ON r.id = cr.resource_id
JOIN category c
ON cr.category_id = c.id
JOIN user u
ON r.created_by = u.id
JOIN user m
ON r.modified_by = m.id
WHERE r.base_id = 'uuid_033a7198-a213-11e3-93de-2b47e5a489c2'
AND r.version = (SELECT MAX(r.version) FROM resource r)
GROUP
BY r.id;
EDIT:
I realize the other parts of the query itself may complicate things, so I'll try to create a cleaner example, which should show what I'm after, I hope.
If I do this:
SELECT id, title, MAX(version) AS 'version' FROM resource GROUP BY title
on a table that looks like this:

Then I get the following results:

which is not correct, as you can see from the table. I.e, it's fetched the highest value for each resource, but if you look at Introduction, e.g. the resource with the value 2 for version has the id 6, whereas the one fetched has the id 1. So the query seems to somehow combine the values from different rows...?
I should note that I'm very much a novice at SQL, and the original query that I exemplified the problem with was something I got help with here, so please do explain as clearly as possible, thanks.
Another note is that I found some suggestion of a subquery, but apart from not returning the correct results either, it was really slow. I'm testing on 5000 rows and I really need to expect it to take only a fraction of a second, in order to meet performance requirements.
EDIT 2:
Found a way to incorporate a statement, sort of like one of the suggested ones, as well as the various solutions here: Retrieving the last record in each group
However, I tried them all, and even though most seem to work, they are incredibly slow…
Take this one:
SELECT
r.id, r.title,
u.name AS 'created_by', m.name AS 'modified_by', r.version, r.version_displayname, r.informationtype,
r.filetype, r.base_id, r.resource_id, r.created, r.modified,
GROUP_CONCAT( CONCAT(CAST(c.id as CHAR),',',c.name,',',c.value) separator ';') AS 'Categories'
FROM
resource r
INNER JOIN
(SELECT
DISTINCT r.id AS id
FROM
resource r
INNER JOIN
category_resource cr1 ON (r.id = cr1.resource_id)
WHERE
cr1.category_id IN (9)
) mr
ON r.id = mr.id
INNER JOIN category_resource cr
ON r.id = cr.resource_id
INNER JOIN category c
ON cr.category_id = c.id
INNER JOIN user u
ON r.created_by = u.id
INNER JOIN user m
ON r.modified_by = m.id
INNER JOIN
(
SELECT max(version) MyVersion, base_id
FROM resource
GROUP BY base_id
) r2
ON r.base_id = r2.base_id
AND r.version = r2.MyVersion
group by r.base_id
order by r.version desc;
The addition at the end (starting with the INNER JOIN) to get only the rows with the highest version value for each base_id slows the query down from 20 ms to around 6-8 seconds. That is a no go… But this surprises me. Although I’m obviously no database expert, it seems to me that database queries should be optimized for getting data like this. But if I do the only alternative I can think of, which is to get all the records regardless of version number, and then filter them in PHP, guess what? That is much faster than this…
I initially thought the performance hit caused by filtering in PHP was too much, but that is about a second’s delay, so still much better than this.
But I feel like I’m missing something, shouldn’t it be possible to do this much more efficiently?
Given your own answer, your question was basically the same as in the link you supplied. Since you had some sub-questions I'll try to give you some additional help there.
If you want to have some kind of version control in your database, then you basically extend your primary key by some version column(s). I'd vote for using startdate/enddate-colums, too for the reason you mentioned. Given your own answer, you could modify your layout accordingly. That's the route you should go if you can!
In your given example it is not clear what the primary key is, since the 'id' column has changing values, too. In your case the primary key would be the column 'title'. So you could use some query like
SELECT title, max(version) as version FROM resource GROUP BY title
to get a result in which you see your original primary key and the latest version -- which together form your actual primary key.
To get all other fields in that table, you'd join that result to the resource table and use the primary key fields as join condition.
SELECT * FROM (
SELECT title, max(version) as version
FROM resource
GROUP BY title) as s
INNER JOIN resource r on (r.title = s.title AND r.version = s.version)
Why did your query give you wrong results?
The reason is, that you had an error in your query that MySQL somewhat "fixed" for you. Normally you would need to supply every column that you did not use in an aggregate function (like MAX()) in your GROUP BY clause. In Your example
SELECT id, title, MAX(version) AS 'version' FROM resource GROUP BY title
you had a colum ('id') in the select-part of your query that you didn't supply in your GROUP BY clause.
In MySQL you can ignore that rule (see here).
When using this feature, all rows in each group should have the same values for the columns that are ommitted from the GROUP BY part. The server is free to return any value from the group, so the results are indeterminate unless all values are the same.
Since the 'id' column had different values for your key (the 'title' column) you just got some result -- in that case MySQL probably just used the first row it found. But the result itself is undefined and might be subject to change e.g. when the database gets updated or the data grows. You should not depend on rules you deduce from results you see while testing!
On other databases like oracle and SQL-Server you would have gotten an error trying to execute that last query.
I hope I could clarify the reason for your results a little.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With