Remove consecutively-repeating values that occur within a date-based timeline

Question

I have a table that contains user actions based on dates. The table is used as a timeline of events. The following example shows how two people changed their job roles through time:

DECLARE @tbl TABLE (
    UserID int,
    ActionID int,
    ActionDesc nvarchar(50),
    ActionDate datetime
);
INSERT INTO @tbl (UserID, ActionID, ActionDesc, ActionDate)
VALUES 
    -- First person
    (1, 200, 'Promoted',   '2000-01-01'),   
    (1, 200, 'Promoted',   '2001-01-01'),   
    (1, 200, 'Promoted',   '2002-02-01'),   
    (1, 300, 'Moved',      '2004-03-01'),   
    (1, 200, 'Promoted',   '2005-03-01'),   
    (1, 200, 'Promoted',   '2006-03-01'),
    -- Second person
    (2, 200, 'Promoted',   '2006-01-01'),   
    (2, 300, 'Moved',      '2007-01-01'),
    (2, 200, 'Promoted',   '2008-01-01');

SELECT * FROM @tbl ORDER BY UserID, ActionDate DESC;

This gives the following, shown as the most recent event first:

enter image description here

I need to show the table in reverse date order, but remove any events that occur directly after they have already just occured, based on a [UserID/ActionID] match. For example, if the person was promoted, and then promoted again straight after this, the second promotion would not be included in the results, because it would be considered a duplicate of the previous action.

The desired output therefore, is:

enter image description here

Following research, I tried to get ROW_NUMBER() to identify the duplicates:

SELECT
    *,
    ROW_NUMBER() OVER (PARTITION BY UserID, ActionID ORDER BY ActionDate ASC) AS RowNum
FROM
    @tbl
ORDER BY
    UserID, ActionDate DESC;

...But it doesn't quite work, as the numbering is not reset after each different action. I might be over-thinking this, but am struggling for inspiration because search results are returning the myriad of questions where people are simply removing duplicates from lists.

Hannah Vernon · Accepted Answer

I would use LEAD to eliminate rows that are unnecessary.

USE tempdb;

DECLARE @tbl TABLE (
    UserID int,
    ActionID int,
    ActionDesc nvarchar(50),
    ActionDate datetime
);
INSERT INTO @tbl (UserID, ActionID, ActionDesc, ActionDate)
VALUES 
    -- First person
    (1, 200, 'Promoted',   '2000-01-01'),   
    (1, 200, 'Promoted',   '2001-01-01'),   
    (1, 200, 'Promoted',   '2002-02-01'),   
    (1, 300, 'Moved',      '2004-03-01'),   
    (1, 200, 'Promoted',   '2005-03-01'),   
    (1, 200, 'Promoted',   '2006-03-01'),
    -- Second person
    (2, 200, 'Promoted',   '2006-01-01'),   
    (2, 300, 'Moved',      '2007-01-01'),
    (2, 200, 'Promoted',   '2008-01-01');

;WITH src AS
(
    SELECT *
        , l = LEAD(t.ActionID) OVER (PARTITION BY t.UserID ORDER BY t.ActionDate DESC)
    FROM @tbl t
)
SELECT src.UserID
    , src.ActionID
    , src.ActionDesc
    , src.ActionDate
FROM src
WHERE src.l <> src.ActionID 
    OR src.l IS NULL

The WHERE clause in the above query eliminates duplicate rows from the output where the previous row is a duplicate ActionID of the current row. The src.l IS NULL ensures we see rows with no duplicate ActionIDs.

The results:

╔════════╦══════════╦════════════╦═════════════════════════╗
║ UserID ║ ActionID ║ ActionDesc ║       ActionDate        ║
╠════════╬══════════╬════════════╬═════════════════════════╣
║      1 ║      200 ║ Promoted   ║ 2005-03-01 00:00:00.000 ║
║      1 ║      300 ║ Moved      ║ 2004-03-01 00:00:00.000 ║
║      1 ║      200 ║ Promoted   ║ 2000-01-01 00:00:00.000 ║
║      2 ║      200 ║ Promoted   ║ 2008-01-01 00:00:00.000 ║
║      2 ║      300 ║ Moved      ║ 2007-01-01 00:00:00.000 ║
║      2 ║      200 ║ Promoted   ║ 2006-01-01 00:00:00.000 ║
╚════════╩══════════╩════════════╩═════════════════════════╝

For tables with a large number of rows, you want to reduce the number of aggregates used in your query to the minimum possible; LEAD provides just this by requiring only a single aggregate. The execution plan for my version:

enter image description here

Remove consecutively-repeating values that occur within a date-based timeline

Tags:

sql-server

tsql

sql-server-2012

EvilDr

1 Answers

Hannah Vernon

Recent Activity

Donate For Us

Remove consecutively-repeating values that occur within a date-based timeline

Tags:

sql-server

tsql

sql-server-2012

EvilDr

1 Answers

Hannah Vernon

Related questions

Recent Activity

Donate For Us