I have a table that contains user actions based on dates. The table is used as a timeline of events. The following example shows how two people changed their job roles through time:
DECLARE @tbl TABLE (
UserID int,
ActionID int,
ActionDesc nvarchar(50),
ActionDate datetime
);
INSERT INTO @tbl (UserID, ActionID, ActionDesc, ActionDate)
VALUES
-- First person
(1, 200, 'Promoted', '2000-01-01'),
(1, 200, 'Promoted', '2001-01-01'),
(1, 200, 'Promoted', '2002-02-01'),
(1, 300, 'Moved', '2004-03-01'),
(1, 200, 'Promoted', '2005-03-01'),
(1, 200, 'Promoted', '2006-03-01'),
-- Second person
(2, 200, 'Promoted', '2006-01-01'),
(2, 300, 'Moved', '2007-01-01'),
(2, 200, 'Promoted', '2008-01-01');
SELECT * FROM @tbl ORDER BY UserID, ActionDate DESC;
This gives the following, shown as the most recent event first:
I need to show the table in reverse date order, but remove any events that occur directly after they have already just occured, based on a [UserID/ActionID] match. For example, if the person was promoted, and then promoted again straight after this, the second promotion would not be included in the results, because it would be considered a duplicate of the previous action.
The desired output therefore, is:
Following research, I tried to get ROW_NUMBER()
to identify the duplicates:
SELECT
*,
ROW_NUMBER() OVER (PARTITION BY UserID, ActionID ORDER BY ActionDate ASC) AS RowNum
FROM
@tbl
ORDER BY
UserID, ActionDate DESC;
...But it doesn't quite work, as the numbering is not reset after each different action. I might be over-thinking this, but am struggling for inspiration because search results are returning the myriad of questions where people are simply removing duplicates from lists.
I would use LEAD
to eliminate rows that are unnecessary.
USE tempdb;
DECLARE @tbl TABLE (
UserID int,
ActionID int,
ActionDesc nvarchar(50),
ActionDate datetime
);
INSERT INTO @tbl (UserID, ActionID, ActionDesc, ActionDate)
VALUES
-- First person
(1, 200, 'Promoted', '2000-01-01'),
(1, 200, 'Promoted', '2001-01-01'),
(1, 200, 'Promoted', '2002-02-01'),
(1, 300, 'Moved', '2004-03-01'),
(1, 200, 'Promoted', '2005-03-01'),
(1, 200, 'Promoted', '2006-03-01'),
-- Second person
(2, 200, 'Promoted', '2006-01-01'),
(2, 300, 'Moved', '2007-01-01'),
(2, 200, 'Promoted', '2008-01-01');
;WITH src AS
(
SELECT *
, l = LEAD(t.ActionID) OVER (PARTITION BY t.UserID ORDER BY t.ActionDate DESC)
FROM @tbl t
)
SELECT src.UserID
, src.ActionID
, src.ActionDesc
, src.ActionDate
FROM src
WHERE src.l <> src.ActionID
OR src.l IS NULL
The WHERE
clause in the above query eliminates duplicate rows from the output where the previous row is a duplicate ActionID of the current row. The src.l IS NULL
ensures we see rows with no duplicate ActionIDs.
The results:
╔════════╦══════════╦════════════╦═════════════════════════╗ ║ UserID ║ ActionID ║ ActionDesc ║ ActionDate ║ ╠════════╬══════════╬════════════╬═════════════════════════╣ ║ 1 ║ 200 ║ Promoted ║ 2005-03-01 00:00:00.000 ║ ║ 1 ║ 300 ║ Moved ║ 2004-03-01 00:00:00.000 ║ ║ 1 ║ 200 ║ Promoted ║ 2000-01-01 00:00:00.000 ║ ║ 2 ║ 200 ║ Promoted ║ 2008-01-01 00:00:00.000 ║ ║ 2 ║ 300 ║ Moved ║ 2007-01-01 00:00:00.000 ║ ║ 2 ║ 200 ║ Promoted ║ 2006-01-01 00:00:00.000 ║ ╚════════╩══════════╩════════════╩═════════════════════════╝
For tables with a large number of rows, you want to reduce the number of aggregates used in your query to the minimum possible; LEAD provides just this by requiring only a single aggregate. The execution plan for my version:
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With