Selecting most recent and specific version in each group of records, for multiple groups

Q: How do you group by and get the latest record in SQL?

Retrieving the last record in each group using GROUP BY There are two solutions explained here using the GROUP BY clause. In both these solutions, we will be using the MAX() function to get the maximum value of id and then retrieving the other columns corresponding to this maximum id.

Q: How do I get the latest record of each ID in SQL?

Here is the syntax that we can use to get the latest date records in SQL Server. Select column_name, .. From table_name Order By date_column Desc; Now, let's use the given syntax to select the last 10 records from our sample table.

Q: Is there a way to select and update rows at the same time?

in SQL 2008 a new TSQL statement "MERGE" is introduced which performs insert, update, or delete operations on a target table based on the results of a join with a source table. You can synchronize two tables by inserting, updating, or deleting rows in one table based on differences found in the other table.

Q: Can we use select * with group by?

Cannot use an aggregate or a subquery in an expression used for the group by list of a GROUP BY clause. The original idea was to create the table in beginning of the query, so the (SELECT * FROM #TBL) could be used on the query itself, instead of defining the names on each GROUP BY.

Tags:

sql

sql-server

tsql

The problem:
I have a table that records data rows in foo. Each time the row is updated, a new row is inserted along with a revision number. The table looks like:

id  rev field
1   1   test1
2   1   fsdfs
3   1   jfds
1   2   test2

Note: the last record is a newer version of the first row.

Is there an efficient way to query for the latest version of a record and for a specific version of a record?

For instance, a query for rev=2 would return the 2, 3 and 4th row (not the replaced 1st row though) while a query for rev=1 yields those rows with rev <= 1 and in case of duplicated ids, the one with the higher revision number is chosen (record: 1, 2, 3).

I would not prefer to return the result in an iterative way.

772

asked Feb 24 '12 12:02

orange

7 Answers

To get only latest revisions:

SELECT * from t t1
WHERE t1.rev = 
  (SELECT max(rev) FROM t t2 WHERE t2.id = t1.id)

To get a specific revision, in this case 1 (and if an item doesn't have the revision yet the next smallest revision):

SELECT * from foo t1
WHERE t1.rev = 
  (SELECT max(rev) 
   FROM foo t2 
   WHERE t2.id = t1.id
   AND t2.rev <= 1)

It might not be the most efficient way to do this, but right now I cannot figure a better way to do this.

answered Oct 19 '22 12:10

Tim

Here's an alternative solution that incurs an update cost but is much more efficient for reading the latest data rows as it avoids computing MAX(rev). It also works when you're doing bulk updates of subsets of the table. I needed this pattern to ensure I could efficiently switch to a new data set that was updated via a long running batch update without any windows of time where we had partially updated data visible.

Aging

Replace the rev column with an age column
Create a view of the current latest data with filter: age = 0
To create a new version of your data ...
INSERT: new rows with age = -1 - This was my slow long running batch process.
UPDATE: UPDATE table-name SET age = age + 1 for all rows in the subset. This switches the view to the new latest data (age = 0) and also ages older data in a single transaction.
DELETE: rows having age > N in the subset - Optionally purge old data

Indexing

Create a composite index with age and then id so the view will be nice and fast and can also be used to look up by id. Although this key is effectively unique, its temporarily non-unique when you're ageing the rows (during UPDATE SET age=age+1) so you'll need to make it non-unique and ideally the clustered index. If you need to find all versions of a given id ordered by age, you may need an additional non-unique index on id then age.

Rollback

Finally ... Lets say you're having a bad day and the batch processing breaks. You can quickly revert to a previous data set version by running:

UPDATE table-name SET age = age - 1 -- Roll back a version
DELETE table-name WHERE age < 0 -- Clean up bad stuff

Existing Table

Suppose you have an existing table that now needs to support aging. You can use this pattern by first renaming the existing table, then add the age column and indexing and then create the view that includes the age = 0 condition with the same name as the original table name.

This strategy may or may not work depending on the nature of technology layers that depended on the original table but in many cases swapping a view for a table should drop in just fine.

Notes

I recommend naming the age column to RowAge in order to indicate this pattern is being used, since it's clearer that its a database related value and it complements SQL Server's RowVersion naming convention. It also won't conflict with a column or view that needs to return a person's age.

Unlike other solutions, this pattern works for non SQL Server databases.

If the subsets you're updating are very large then this might not be a good solution as your final transaction will update not just the current records but all past version of the records in this subset (which could even be the entire table!) so you may end up locking the table.

answered Oct 19 '22 11:10

Tony O'Hagan

This is how I would do it. ROW_NUMBER() requires SQL Server 2005 or later

Sample data:

DECLARE @foo TABLE (
    id int,
    rev int,
    field nvarchar(10)
)

INSERT @foo VALUES
    ( 1, 1, 'test1' ),
    ( 2, 1, 'fdsfs' ),
    ( 3, 1, 'jfds' ),
    ( 1, 2, 'test2' )

The query:

DECLARE @desiredRev int

SET @desiredRev = 2

SELECT * FROM (
SELECT 
    id,
    rev,
    field,
    ROW_NUMBER() OVER (PARTITION BY id ORDER BY rev DESC) rn
FROM @foo WHERE rev <= @desiredRev 
) numbered
WHERE rn = 1

The inner SELECT returns all relevant records, and within each id group (that's the PARTITION BY), computes the row number when ordered by descending rev.

The outer SELECT just selects the first member (so, the one with highest rev) from each id group.

Output when @desiredRev = 2 :

id          rev         field      rn
----------- ----------- ---------- --------------------
1           2           test2      1
2           1           fdsfs      1
3           1           jfds       1

Output when @desiredRev = 1 :

id          rev         field      rn
----------- ----------- ---------- --------------------
1           1           test1      1
2           1           fdsfs      1
3           1           jfds       1

answered Oct 19 '22 10:10

AakashM

If you want all the latest revisions of each field, you can use

SELECT C.rev, C.fields FROM (
  SELECT MAX(A.rev) AS rev, A.id
  FROM yourtable A
  GROUP BY A.id) 
AS B
INNER JOIN yourtable C
ON B.id = C.id AND B.rev = C.rev

In the case of your example, that would return

 rev field
 1   fsdfs   
 1   jfds   
 2   test2

answered Oct 19 '22 12:10

Treb

SELECT
  MaxRevs.id,
  revision.field
FROM
  (SELECT
     id,
     MAX(rev) AS MaxRev
   FROM revision
   GROUP BY id
  ) MaxRevs
  INNER JOIN revision 
    ON MaxRevs.id = revision.id AND MaxRevs.MaxRev = revision.rev

answered Oct 19 '22 10:10

Pittsburgh DBA

SELECT foo.* from foo 
left join foo as later 
on foo.id=later.id and later.rev>foo.rev 
where later.id is null;

answered Oct 19 '22 12:10

crimaniak

How about this?

select id, max(rev), field from foo group by id

For querying specific revision e.g. revision 1,

select id, max(rev), field from foo where rev <= 1 group by id

answered Oct 19 '22 11:10

Joonhui Kim

Related questions
                            
                                Select from union tsql
                            
                                SWITCH with LIKE inside SELECT query in MySQL
                            
                                Performance difference between UUID, CHAR, and VARCHAR in PostgreSql table?
                            
                                How many tables can be created in a mysql database?
                            
                                MySQL add column if not exist
                            
                                GETUTCDATE Function
                            
                                How to do SQL select top N ... in AS400
                            
                                Limit amount of records retrieved when using Doctrine DQL in Symfony2
                            
                                Using 'case expression column' in where clause
                            
                                How does one escape an apostrophe in db2 sql
                            
                                SELECT inside a COUNT
                            
                                SQL Current month/ year question
                            
                                How do I add a column to large sql server table
                            
                                SQL Server 2005 Unique constraint on two columns
                            
                                MySQL - DATE_ADD month interval
                            
                                The "X" property on "Y" could not be set to a 'null' value. You must set this property to a non-null value of type 'Int32'
                            
                                How can I get the month number (not month name) from a date in SQL Server?
                            
                                Is there an Oracle SQL query that aggregates multiple rows into one row? [duplicate]
                            
                                Selecting entries by date - >= NOW(), MySQL
                            
                                UNIX_TIMESTAMP in SQL Server

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Selecting most recent and specific version in each group of records, for multiple groups

Tags:

sql

sql-server

tsql

orange

People also ask

7 Answers

Tim

Aging

Indexing

Rollback

Existing Table

Notes

Tony O'Hagan

AakashM

Treb

Pittsburgh DBA

crimaniak

Joonhui Kim

Recent Activity

Donate For Us