Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Remove consecutively-repeating values that occur within a date-based timeline

I have a table that contains user actions based on dates. The table is used as a timeline of events. The following example shows how two people changed their job roles through time:

DECLARE @tbl TABLE (
    UserID int,
    ActionID int,
    ActionDesc nvarchar(50),
    ActionDate datetime
);
INSERT INTO @tbl (UserID, ActionID, ActionDesc, ActionDate)
VALUES 
    -- First person
    (1, 200, 'Promoted',   '2000-01-01'),   
    (1, 200, 'Promoted',   '2001-01-01'),   
    (1, 200, 'Promoted',   '2002-02-01'),   
    (1, 300, 'Moved',      '2004-03-01'),   
    (1, 200, 'Promoted',   '2005-03-01'),   
    (1, 200, 'Promoted',   '2006-03-01'),
    -- Second person
    (2, 200, 'Promoted',   '2006-01-01'),   
    (2, 300, 'Moved',      '2007-01-01'),
    (2, 200, 'Promoted',   '2008-01-01');

SELECT * FROM @tbl ORDER BY UserID, ActionDate DESC;

This gives the following, shown as the most recent event first:

enter image description here

I need to show the table in reverse date order, but remove any events that occur directly after they have already just occured, based on a [UserID/ActionID] match. For example, if the person was promoted, and then promoted again straight after this, the second promotion would not be included in the results, because it would be considered a duplicate of the previous action.

The desired output therefore, is:

enter image description here

Following research, I tried to get ROW_NUMBER() to identify the duplicates:

SELECT
    *,
    ROW_NUMBER() OVER (PARTITION BY UserID, ActionID ORDER BY ActionDate ASC) AS RowNum
FROM
    @tbl
ORDER BY
    UserID, ActionDate DESC;

...But it doesn't quite work, as the numbering is not reset after each different action. I might be over-thinking this, but am struggling for inspiration because search results are returning the myriad of questions where people are simply removing duplicates from lists.

like image 825
EvilDr Avatar asked Jan 27 '23 11:01

EvilDr


1 Answers

I would use LEAD to eliminate rows that are unnecessary.

USE tempdb;

DECLARE @tbl TABLE (
    UserID int,
    ActionID int,
    ActionDesc nvarchar(50),
    ActionDate datetime
);
INSERT INTO @tbl (UserID, ActionID, ActionDesc, ActionDate)
VALUES 
    -- First person
    (1, 200, 'Promoted',   '2000-01-01'),   
    (1, 200, 'Promoted',   '2001-01-01'),   
    (1, 200, 'Promoted',   '2002-02-01'),   
    (1, 300, 'Moved',      '2004-03-01'),   
    (1, 200, 'Promoted',   '2005-03-01'),   
    (1, 200, 'Promoted',   '2006-03-01'),
    -- Second person
    (2, 200, 'Promoted',   '2006-01-01'),   
    (2, 300, 'Moved',      '2007-01-01'),
    (2, 200, 'Promoted',   '2008-01-01');

;WITH src AS
(
    SELECT *
        , l = LEAD(t.ActionID) OVER (PARTITION BY t.UserID ORDER BY t.ActionDate DESC)
    FROM @tbl t
)
SELECT src.UserID
    , src.ActionID
    , src.ActionDesc
    , src.ActionDate
FROM src
WHERE src.l <> src.ActionID 
    OR src.l IS NULL

The WHERE clause in the above query eliminates duplicate rows from the output where the previous row is a duplicate ActionID of the current row. The src.l IS NULL ensures we see rows with no duplicate ActionIDs.

The results:

╔════════╦══════════╦════════════╦═════════════════════════╗
║ UserID ║ ActionID ║ ActionDesc ║       ActionDate        ║
╠════════╬══════════╬════════════╬═════════════════════════╣
║      1 ║      200 ║ Promoted   ║ 2005-03-01 00:00:00.000 ║
║      1 ║      300 ║ Moved      ║ 2004-03-01 00:00:00.000 ║
║      1 ║      200 ║ Promoted   ║ 2000-01-01 00:00:00.000 ║
║      2 ║      200 ║ Promoted   ║ 2008-01-01 00:00:00.000 ║
║      2 ║      300 ║ Moved      ║ 2007-01-01 00:00:00.000 ║
║      2 ║      200 ║ Promoted   ║ 2006-01-01 00:00:00.000 ║
╚════════╩══════════╩════════════╩═════════════════════════╝

For tables with a large number of rows, you want to reduce the number of aggregates used in your query to the minimum possible; LEAD provides just this by requiring only a single aggregate. The execution plan for my version:

enter image description here

like image 88
Hannah Vernon Avatar answered Feb 03 '23 07:02

Hannah Vernon