I've got two tables in a SQL Server 2000 database joined by a parent child relationship. In the child database, the unique key is made up of the parent id and the datestamp.
I'm needing to do a join on these tables such that only the most recent entry for each child is joined.
Can anyone give me any hints how I can go about this?
Here's the most optimized way I've found to do this. I tested it against several structures and this way had the lowest IO compared to other approaches.
This sample would get the last revision to an article
SELECT t.*
FROM ARTICLES AS t
--Join the the most recent history entries
INNER JOIN REVISION lastHis ON t.ID = lastHis.FK_ID
--limits to the last history in the WHERE statement
LEFT JOIN REVISION his2 on lastHis.FK_ID = his2.FK_ID and lastHis.CREATED_TIME < his2.CREATED_TIME
WHERE his2.ID is null
If you had a table which just contained the most recent entry for each parent, and the parent's id, then it would be easy, right?
You can make a table like that by joining the child table on itself, taking only the maximum datestamp for each parent id. Something like this (your SQL dialect may vary):
SELECT t1.*
FROM child AS t1
LEFT JOIN child AS t2
ON (t1.parent_id = t2.parent_id and t1.datestamp < t2.datestamp)
WHERE t2.datestamp IS NULL
That gets you all of the rows in the child table for which no higher timestamp exists, for that parent id. You can use that table in a subquery to join to:
SELECT *
FROM parent
JOIN ( SELECT t1.*
FROM child AS t1
LEFT JOIN child AS t2
ON (t1.parent_id = t2.parent_id and t1.datestamp < t2.datestamp)
WHERE t2.datestamp IS NULL ) AS most_recent_children
ON (parent.id = most_recent_children.parent_id
or join the parent table directly into it:
SELECT parent.*, t1.*
FROM parent
JOIN child AS t1
ON (parent.id = child.parent_id)
LEFT JOIN child AS t2
ON (t1.parent_id = t2.parent_id and t1.datestamp < t2.datestamp)
WHERE t2.datestamp IS NULL
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With