I have a data frame that contains multiple subjects (id
), with repeated observations (recorded at times time
). Each of the times may or may not be associated with an event (event
). An example data frame can be generated with:
set.seed(12345)
id <- c(rep(1, 9), rep(2, 9), rep(3, 9))
time <- c(seq(from = 0, to = 96, by = 12),
seq(from = 0, to = 80, by = 10),
seq(from = 0, to = 112, by = 14))
random <- runif(n = 27)
event <- rep(100, 27)
df <- data.frame(cbind(id, time, event, random))
df$event <- ifelse(df$random < 0.55, 0, df$event)
df <- subset(df, select = -c(random))
df$event <- ifelse(df$time == 0, 100, df$event)
I would like to calculate the time between events (tae
[time after the last event]), such that the ideal output would look like:
head(ideal_df)
id time event tae
1 1 0 100 0
2 1 12 100 0
3 1 24 100 0
4 1 36 100 0
5 1 48 0 12
6 1 60 0 24
In fortran, I use the following code to create the tae
variable:
IF(EVENT.GT.0) THEN
TEVENT = TIME
TAE = 0
ENDIF
IF(EVENT.EQ.0) THEN
TAE = TIME - TEVENT
ENDIF
In R, I have attempted both an ifelse
and dplyr
solution. However, neither produce my desired output.
# Calculate the time since last event (using ifelse)
df$tae <- ifelse(df$event >= 0, df$tevent = df$time & df$tae = 0, df$tae = df$time - df$tevent)
Error: unexpected '=' in "df$tae <- ifelse(df$event >= 0, df$tevent ="
# Calculate the time since last event (using dplyr)
res <- df %>%
arrange(id, time) %>%
group_by(id) %>%
mutate(tae = time - lag(time))
res
id time event tae
1 1 0 100 NA
2 1 12 100 12
3 1 24 100 12
4 1 36 100 12
5 1 48 0 12
6 1 60 0 12
Clearly, neither of these yield my desired output. It appears as though assigning variables within the ifelse
function is not well tolerated by R. My attempt at a dplyr
solution also fails to account for the event
variable...
Lastly, another variable that recorded the time until the next event tue
will be needed. If anyone happens to have a thought regarding how best to go about this (perhaps more tricky) calculation, please feel free to share.
Any thoughts regarding how to get one of these working (or an alternative solution) would be greatly appreciated. Thanks!
P.S. -- A reproducible example when the interval between events changes within an ID
is presented below:
id <- rep(1, 9)
time <- c(0, 10, 22, 33, 45, 57, 66, 79, 92)
event <- c(100, 0, 0, 100, 0, 100, 0, 0, 100)
df <- data.frame(cbind(id, time, event))
head(df)
id time event
1 1 0 100
2 1 10 0
3 1 22 0
4 1 33 100
5 1 45 0
6 1 57 100
The formula for calculating elapsed time is elapsed time = end time – start time. Subtract the minutes and hours separately. For example to calculate the elapsed time between 12:10 and 16:40, subtract 12:10 from 16:4.
Elapsed time means the amount of time between two events. For example, if you go to bed at 10 PM and wake up at 6 AM, the elapsed time between those events is 8 hours.
Here's an approach with dplyr
:
library(dplyr)
df %>%
mutate(tmpG = cumsum(c(FALSE, as.logical(diff(event))))) %>%
group_by(id) %>%
mutate(tmp_a = c(0, diff(time)) * !event,
tmp_b = c(diff(time), 0) * !event) %>%
group_by(tmpG) %>%
mutate(tae = cumsum(tmp_a),
tbe = rev(cumsum(rev(tmp_b)))) %>%
ungroup() %>%
select(-c(tmp_a, tmp_b, tmpG))
The new columns include time after event (tae
) and time before event (tbe
).
The result:
id time event tae tbe
1 1 0 100 0 0
2 1 12 100 0 0
3 1 24 100 0 0
4 1 36 100 0 0
5 1 48 0 12 48
6 1 60 0 24 36
7 1 72 0 36 24
8 1 84 0 48 12
9 1 96 100 0 0
10 2 0 100 0 0
11 2 12 0 12 24
12 2 24 0 24 12
13 2 36 100 0 0
14 2 48 0 12 48
15 2 60 0 24 36
16 2 72 0 36 24
17 2 84 0 48 12
18 2 96 0 60 0
19 3 0 100 0 0
20 3 12 100 0 0
21 3 24 0 12 24
22 3 36 0 24 12
23 3 48 100 0 0
24 3 60 100 0 0
25 3 72 100 0 0
26 3 84 0 12 12
27 3 96 100 0 0
The result with the second example:
id time event tae tbe
1 1 0 100 0 0
2 1 10 0 10 23
3 1 22 0 22 11
4 1 33 100 0 0
5 1 45 0 12 12
6 1 57 100 0 0
7 1 66 0 9 26
8 1 79 0 22 13
9 1 92 100 0 0
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With