Problem: I would like to fill a value backwards from occurrence by group with a condition. I am trying to generate column C in the desired output.
Set C equal to B and fill 1 backwards if A is <= 35, stop fill if A > 35.
I am trying to complete this task using dplyr.
Building on something similar to my previous question: Fill value backwards from occurence by group
Input:
DAT_in = data.frame(ID=c(1,1,1,1,
2,2,2,
3,3,3,
4,4,4,4,4),
time=c(1,2,3,4,
1,2,3,
1,2,3,
1,2,3,4,5),
A=c(100,35,25,0,
100,75,55,
100,28,25,
100,30,45,25,0),
B=c(0,0,0,1,
0,0,0,
0,0,1,
0,0,0,0,1))
Desired output (C):
DAT_out = data.frame(ID=c(1,1,1,1,
2,2,2,
3,3,3,
4,4,4,4,4),
time=c(1,2,3,4,
1,2,3,
1,2,3,
1,2,3,4,5),
A=c(100,35,25,0,
100,75,55,
100,28,25,
100,30,45,25,0),
B=c(0,0,0,1,
0,0,0,
0,0,1,
0,0,0,0,1),
C=c(0,1,1,1,
0,0,0,
0,1,1,
0,0,0,1,1))
Initially this question was aimed at dplyr
so here's the dplyr
solution I came up with. It's much less elegant than Frank's solution, but I already did it so why not write it down "for posterity".
Dat_out_step1 <- DAT_in %>%
group_by(ID) %>%
filter(B==1) %>%
select(-A,-B) %>%
summarize(max.time = min(time)) %>%
full_join(DAT_in, by = "ID")
Dat_out_step2 <- Dat_out_step1 %>%
group_by(ID) %>%
filter(A>35 & time < max.time) %>%
select(-A, -B, -max.time) %>%
group_by(ID) %>%
summarize(min.time = max(time))
DAT_out_step3 <- Dat_out_step1 %>%
left_join(Dat_out_step2) %>%
mutate(C = ifelse(is.na(max.time), 0,
(time > min.time & time <= max.time)*1)) %>%
select(-max.time, -min.time)
EDIT:
To Frank's suggestion, instead of using ifelse()
you can use (in the last block):
DAT_out_step3 <- Dat_out_step1 %>%
left_join(Dat_out_step2) %>%
mutate(C = replace((time > min.time & time <= max.time)*1 ,is.na(max.time), 0)) %>%
select(-max.time, -min.time)
Old ifelse()
habits die hard... Thanks Frank for spotting that one out.
If you are looking for dplyr
solution, will this work?
DAT_in2 <- DAT_in %>%
mutate(C = ifelse(A <= 35 & lead(A) <= 35, 1, B)) %>%
mutate(C = ifelse(row_number() == n(), B, C))
# Check if DAT_in2 and DAT_out are the same
identical(DAT_in2, DAT_out)
[1] TRUE
This seems to work:
library(data.table)
setDT(DAT_in)
DAT_in[order(ID, -time), C := as.integer(cumsum(A > 35) == 0L), by=ID][]
all.equal(DAT_in$C, DAT_out$C) # TRUE
How it works
1
until A > 35
.DT[i, v := ..., by=g]
only orders by i
during the assignment to column v
; the initial sort order of DT
is preserved.
as.integer
coerces TRUE to 1; FALSE to 0.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With