Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Filling the 0-1 column with cumulative sums in between 1s

Tags:

r

I have data looks like this:

id <- c(1,1,1,1,1,1,2,2,2,2,3,3,3,3,3,3,3,3,3,4,4,4)
start <- c(NA, NA, NA, 1, NA, NA, NA, NA, 1, NA, NA, NA, 1, NA, NA, NA, NA, NA, 1, NA, NA, NA)
e <- as.data.frame(cbind(id, start))

I would like to full NAs with an analogy of cumulative sum each time starting over either when start==1 or there is a new id. I made a for-loop, but my actual data is too long for for-loop to end in the nearest few days. Is there a way to speed up my solution? My target variable can be reproduced as follows:

e$target <- NA
for (i in 2:length(e$id)){
  if (e$id[i]!=e$id[i-1]){
    e$target[i] <- NA
  } else {
    e$target[i] <- e$target[i-1]+1
    if (!is.na(e$start[i]==1)){
      e$target[i] <- 0
    }
  }
}
like image 794
user3349993 Avatar asked Nov 30 '25 01:11

user3349993


2 Answers

We can do this with data.table

library(data.table)
setDT(e)[,  target1 := seq_len(.N)-1,.(grp = cumsum(!is.na(start)), id)]
e[e[, c(.I[all(is.na(start))], .I[seq_len(which.max(!is.na(start))-1)]),
                  id]$V1, target1 := NA]
e
#    id start target target1
# 1:  1    NA     NA      NA
# 2:  1    NA     NA      NA
# 3:  1    NA     NA      NA
# 4:  1     1      0       0
# 5:  1    NA      1       1
# 6:  1    NA      2       2
# 7:  2    NA     NA      NA
# 8:  2    NA     NA      NA
# 9:  2     1      0       0
#10:  2    NA      1       1
#11:  3    NA     NA      NA
#12:  3    NA     NA      NA
#13:  3     1      0       0
#14:  3    NA      1       1
#15:  3    NA      2       2
#16:  3    NA      3       3
#17:  3    NA      4       4
#18:  3    NA      5       5
#19:  3     1      0       0
#20:  4    NA     NA      NA
#21:  4    NA     NA      NA
#22:  4    NA     NA      NA
like image 77
akrun Avatar answered Dec 01 '25 16:12

akrun


You can try tidyverse. Use fill to drag latest non-NA entry down, and then replace those values with the sequence of their length (-1 is to get the sequence to start at 0)

library(tidyverse)

e %>% 
 group_by(id) %>% 
 mutate(target = start) %>% 
 fill(target) %>% 
 mutate(target = replace(target, !is.na(target), seq(length(target[!is.na(target)]))-1), 
        target = replace(target, start == 1, 0))
like image 21
Sotos Avatar answered Dec 01 '25 15:12

Sotos