Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Identify records where a given sequence of events happens within x days

Tags:

r

data.table

I have a large data.table, similar in structure to df:

library("data.table")
df <- data.frame(part = c("A", "B", "A", "C", "A", "D", "B", "D", "E"), 
                 day = c(1, 2, 3, 4, 5, 6, 6, 7, 15), 
                 code = c("S", "S", "P", "X", "P", "S", "P", "P", "P"))
setDT(df)
df
   part day code
1:    A   1    S
2:    B   2    S
3:    A   3    P
4:    C   4    X
5:    A   5    P
6:    D   6    S
7:    B   6    P
8:    D   7    P
9:    E  15    P

How can I add a column that flags records where code = S and the same part has code = P within 3 subsequent days? Expected result:

   part day code  flag
1:    A   1    S  TRUE
2:    B   2    S FALSE
3:    A   3    P FALSE
4:    C   4    X FALSE
5:    A   5    P FALSE
6:    D   6    S  TRUE
7:    B   6    P FALSE
8:    D   7    P FALSE
9:    E  15    P FALSE
like image 325
ddunn801 Avatar asked Mar 10 '23 14:03

ddunn801


2 Answers

I think this does it

df[, v := FALSE ]
df[code == "S", v := !is.na(
  df[code == "P"][df[code == "S"], on=c("part", "day"), roll=-3, which=TRUE]
)]

   part day code     v
1:    A   1    S  TRUE
2:    B   2    S FALSE
3:    A   3    P FALSE
4:    C   4    X FALSE
5:    A   5    P FALSE
6:    D   6    S  TRUE
7:    B   6    P FALSE
8:    D   7    P FALSE
9:    E  15    P FALSE

How it works !is.na(x[i, which=TRUE]) tells us whether each row of i found a match in x. (This might break down if i finds multiple matches.) The roll part expands the scope of matches to cover rows that are nearby in terms of the last column being joined on.

I'm not sure that the roll value there is correct since I have never used it this way.

like image 180
Frank Avatar answered Mar 13 '23 02:03

Frank


This can work too-

df$v <- as.logical((df$code== "S") *
        c(sapply(seq(1:(nrow(df)-2)), function(x)
           {
            max(df[(x:x+2),"code"] == "P")
           }),
          df[nrow(df)-1,"code"]=="P",
          df[nrow(df),"code"]=="P"))
like image 44
code_is_entropy Avatar answered Mar 13 '23 02:03

code_is_entropy