Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

SQL Server - Select Rows, Remove Duplicates But Keep row with highest date

Considering the following database table in SQL Server 2008:

ActionID (PK)    ActionType    ActionDate              UserID  ContentID
1                'Create'      '2013-05-26 18:40:00'   1       10
2                'Create'      '2013-05-26 18:30:00'   2       10
3                'Edit'        '2013-05-26 12:30:00'   5       12
4                'Edit'        '2013-05-26 12:25:00'   5       12
5                'Delete'      '2013-05-26 12:22:00'   6       12

I want to write a SQL query that groups by ContentID and ActionType but where the row with the latest ActionDate is returned and other rows ignored, even if they have different UserID or other column values.

So what it should return is:

ActionID (PK)    ActionType    ActionDate              UserID  ContentID
1                'Create'      '2013-05-26 18:40:00'   1       10
3                'Edit'        '2013-05-26 12:30:00'   5       12
5                'Delete'      '2013-05-26 12:22:00'   6       12

But I can't quite figure out how to write the query to do it.

like image 758
Sunday Ironfoot Avatar asked Dec 21 '22 05:12

Sunday Ironfoot


1 Answers

One approach would be to use a CTE (Common Table Expression).

With this CTE, you can partition your data by some criteria - i.e. your ContentID and Actiontype - and have SQL Server number all your rows starting at 1 for each of those "partitions", ordered by the ActionDate.

So try something like this:

;WITH Actions AS
(
   SELECT 
       ActionID, ActionType, ActionDate, UserID, ContentID,
       RowNum = ROW_NUMBER() OVER(PARTITION BY ContentID, ActionType ORDER BY ActionDate DESC)
   FROM 
       dbo.YourTable
   WHERE
      ......
)
SELECT 
   ActionID, ActionType, ActionDate, UserID, ContentID,
FROM 
   Actions
WHERE
   RowNum = 1
ORDER BY 
   ActionDate DESC

Does that approach what you're looking for??

like image 180
marc_s Avatar answered May 20 '23 05:05

marc_s