Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

incremental counter within dataframe only when a condition is met in r

I would like to create an accumulative incremental counter that increases only when a condition is met.

DT <- data.table(id = c(1, 1, 1, 1, 1, 1, 1, 2, 2, 2), 
               b = c(10L, 5L, 3L, 4L, 2L, 6L, 1L, 3L, 5L, 7L))

I don't get the desired result with rleid because when two conditions are met in consecutive rows, the increment is not performed

> DT[,count := rleid(b>=5),id]
> DT
    id  b count
 1:  1 10     1
 2:  1  5     1
 3:  1  3     2
 4:  1  4     2
 5:  1  2     2
 6:  1  6     3
 7:  1  1     4
 8:  2  3     1
 9:  2  5     2
10:  2  7     2

The expected result is

> DT
    id  b count
 1:  1 10     1
 2:  1  5     2
 3:  1  3     2
 4:  1  4     2
 5:  1  2     2
 6:  1  6     3
 7:  1  1     3
 8:  2  3     1
 9:  2  5     2
10:  2  7     3
like image 827
Oriol Prat Avatar asked Dec 10 '22 23:12

Oriol Prat


1 Answers

Here is an option with cumsum. Grouped by 'id', get the cumulative sum of logical expression (b >= 5). For 'id' 2, the first element that is greater than or equal to 5 is at position 2 (in the grouped position), thus the first row will be 0. Inorder to make this 1, an option is to convert it to factor and then coerce to integer so that we get the integer storage values (R indexing starts from 1)

DT[, count := as.integer(factor(cumsum(b >= 5))), id]

-output

DT
    id  b count
 1:  1 10     1
 2:  1  5     2
 3:  1  3     2
 4:  1  4     2
 5:  1  2     2
 6:  1  6     3
 7:  1  1     3
 8:  2  3     1
 9:  2  5     2
10:  2  7     3
like image 167
akrun Avatar answered May 03 '23 09:05

akrun