I want to number certain combinations of row in a dataframe (which is ordered on ID and on Time)
tc <- textConnection('
id time end_yn
abc 10 0
abc 11 0
abc 12 1
abc 13 0
def 10 0
def 15 1
def 16 0
def 17 0
def 18 1
')
test <- read.table(tc, header=TRUE)
The goal is to create a new column ("number
") that numbers each row per id
from 1 to n
until end_yn == 1
is hit. After end_yn == 1
, the numbering should start over.
Without taking the end_yn == 1
condition into account the rows can be numbered using:
DT <- data.table(test)
DT[, id := seq_len(.N), by = id]
However the expected outcome should be:
id time end_yn number
abc 10 0 1
abc 11 0 2
abc 12 1 3
abc 13 0 1
def 10 0 1
def 15 1 2
def 16 0 1
def 17 0 2
def 18 1 3
How to incorporate the end_yn == 1
condition?
I'm guessing there are different ways to do this, but here's one:
DT[, cEnd := c(0,cumsum(end_yn)[-.N])] # carry the end value forward
DT[, number := seq_len(.N), by = "id,cEnd"] # create your sequence
DT[, cEnd := NULL] # remove the column created above
Setting id
as the key for DT
might be worth while.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With