Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Get Lead value over multiple partitions

I have a problem that I feel could be solved using lag/lead + partitions but I can't wrap my head around it.

Clients are invited to participate in research-projects every two years (aprox.). A number of clients is selected for each project. Some clients are selected for multiple research-projects. Those get sent an invitation. In some cases no invitation is sent. If a client does not react to an invitation, a 2nd invitation (reminder) is sent. A 3rd, a 4rd are also possible.

I need to find out whether a client has had an invitation for a previous research-project. (And optionally, which invitation that was).

The dataset looks like this:

clientID | projectID | invitationID
  14     |    267    |     489
  14     |    267    |     325
  16     |    385    |     475
  17     |    546    |     NULL
  17     |    547    |     885
  17     |    548    |     901
  18     |    721    |     905
  18     |    834    |     906
  18     |    834    |     907
  19     |    856    |     908
  19     |    856    |     929
  19     |    857    |     931
  19     |    857    |     945
  19     |    858    |     NULL


Client 14 has had 2 invitations for the same research-project
Client 16 has had 1 invitation for 1 research-project
Client 17 has been selected for 3 research-projects but opted out for project 546, receiving 1 invitation each for the following projects. 
Client 18 has been selected for 2 research-projects. For the second project he got a 2 invitations.
Client 19 has been selected for three research-projects. For the first two a reminder was set. Client 19 was selected for project 858 but opted out thus no invitation.

Now I need to determine per client whether there has been a invitation for a previous research-project. (And optionally, which invitation that was). I only need the first invitation (if there were multiple). So my resulting dataset should look like this (stuff between brackets is optional):

clientID | projectID | invitationID | InvitedForPreviousProject
  14     |    267    |     489      |      0
  14     |    267    |     325      |      0
  16     |    385    |     475      |      0
  17     |    546    |     NULL     |      0
  17     |    547    |     885      |      0
  17     |    548    |     901      |      1 (885)
  18     |    721    |     905      |      0
  18     |    834    |     906      |      1 (905)
  18     |    834    |     907      |      1 (905)
  19     |    856    |     908      |      0
  19     |    856    |     929      |      0
  19     |    857    |     931      |      1 (908)
  19     |    857    |     945      |      1 (908)
  19     |    858    |     NULL     |      1 (931)

Can this be done using LEAD, Rank, Dense-Rank? Create-statement including data below

declare @table table (
    [clientID] [int] NULL,
    [projectID] [int] NULL,
    [invitationID] [int] NULL
)
INSERT @table ([clientID], [projectID], [invitationID]) VALUES
(14, 267, 489),
(14, 267, 325),
(16, 385, 475),
(17, 546, NULL),
(17, 547, 885),
(17, 548, 901),
(18, 721, 905),
(18, 834, 906),
(18, 834, 907),
(19, 856, 908),
(19, 856, 929),
(19, 857, 931),
(19, 857, 945),
(19, 858, NULL)
like image 255
Henrov Avatar asked Nov 14 '18 10:11

Henrov


2 Answers

Might this help?

declare @table table (
    [clientID] [int] NULL,
    [projectID] [int] NULL,
    [invitationID] [int] NULL
)
INSERT @table ([clientID], [projectID], [invitationID]) VALUES
(14, 267, 489),
(14, 267, 325),
(16, 385, 475),
(17, 546, NULL),
(17, 547, 885),
(17, 548, 901),
(18, 721, 905),
(18, 834, 906),
(18, 834, 907),
(19, 856, 908),
(19, 856, 929),
(19, 857, 931),
(19, 857, 945),
(19, 858, NULL);

--The query uses DENSE_RANK() and a correlated sub-query

WITH ranked AS
(
    SELECT t.* 
         ,DENSE_RANK() OVER(PARTITION BY t.clientID ORDER BY t.projectID) AS InvRank
    FROM @table t
)
SELECT r.*
      ,earlierProject.invitationID
FROM ranked r
OUTER APPLY(SELECT TOP 1 *
            FROM ranked r2 
            WHERE r2.clientID=r.clientID
             AND  r2.projectID<r.projectID 
             AND  r2.InvRank=r.InvRank-1   
            ORDER BY invitationID ASC
            ) earlierProject
ORDER BY r.clientID,r.projectID,r.invitationID;

The invitationID will be NULL in case of "0" in your table and set to the needed value in case of an item found.

Hint

There's no need for the APPLY actually. If you'd need the invitationID only, you might place the sub-query as a column directly (slightly faster). But this is better to read and you could get hands on the other columns as well...

like image 191
Shnugo Avatar answered Oct 11 '22 22:10

Shnugo


You need a column that specifies the ordering. Let me assume that there is an invitation date as well as the other columns.

With this information, your flag is easily calculated by comparing two values:

  • minimum invitation date for the client
  • minimum invitation date for the client/project id

When these are the same, this is the first project with an invitation.

So:

select t.*,
       (case when min(invitationDate) over (partition by clientId order by invitationDate) =
                  min(invitationDate) over (partition by clientId, projectId order by invitationDate)
             then 0 else 1                  
        end) as InvitedForPreviousProject
from @table t;
like image 40
Gordon Linoff Avatar answered Oct 11 '22 21:10

Gordon Linoff