I have a dataframe that contains the dates of multiple types of events.
df <- data.frame(date=as.Date(c("06/07/2000","15/09/2000","15/10/2000"
,"03/01/2001","17/03/2001","23/04/2001",
"26/05/2001","01/06/2001",
"30/06/2001","02/07/2001","15/07/2001"
,"21/12/2001"), "%d/%m/%Y"),
event_type=c(0,4,1,2,4,1,0,2,3,3,4,3))
date event_type
---------------- ----------
1 2000-07-06 0
2 2000-09-15 4
3 2000-10-15 1
4 2001-01-03 2
5 2001-03-17 4
6 2001-04-23 1
7 2001-05-26 0
8 2001-06-01 2
9 2001-06-30 3
10 2001-07-02 3
11 2001-07-15 4
12 2001-12-21 3
I am trying to calculate the days between each event type so the output looks like the below:
date event_type days_since_last_event
---------------- ---------- ---------------------
1 2000-07-06 0 NA
2 2000-09-15 4 NA
3 2000-10-15 1 NA
4 2001-01-03 2 NA
5 2001-03-17 4 183
6 2001-04-23 1 190
7 2001-05-26 0 324
8 2001-06-01 2 149
9 2001-06-30 3 NA
10 2001-07-02 3 2
11 2001-07-15 4 120
12 2001-12-21 3 172
I have benefited from the answers from these two previous posts but have not been able to address my specific problem in R; multiple event types.
Calculate elapsed time since last event
Calculate days since last event in R
Below is as far as I have gotten. I have not been able to leverage the last event index to calculate the last event date.
df <- cbind(df, as.vector(data.frame(count=ave(df$event_type==df$event_type,
df$event_type, FUN=cumsum))))
df <- rename(df, c("count" = "last_event_index"))
date event_type last_event_index
--------------- ------------- ----------------
1 2000-07-06 0 1
2 2000-09-15 4 1
3 2000-10-15 1 1
4 2001-01-03 2 1
5 2001-03-17 4 2
6 2001-04-23 1 2
7 2001-05-26 0 2
8 2001-06-01 2 2
9 2001-06-30 3 1
10 2001-07-02 3 2
11 2001-07-15 4 3
12 2001-12-21 3 3
We can use diff
to get the difference between adjacent 'date' after grouping by 'event_type'. Here, I am using data.table
approach by converting the 'data.frame' to 'data.table' (setDT(df)
), grouped by 'event_type', we get the diff
of 'date'.
library(data.table)
setDT(df)[,days_since_last_event :=c(NA,diff(date)) , by = event_type]
df
# date event_type days_since_last_event
# 1: 2000-07-06 0 NA
# 2: 2000-09-15 4 NA
# 3: 2000-10-15 1 NA
# 4: 2001-01-03 2 NA
# 5: 2001-03-17 4 183
# 6: 2001-04-23 1 190
# 7: 2001-05-26 0 324
# 8: 2001-06-01 2 149
# 9: 2001-06-30 3 NA
#10: 2001-07-02 3 2
#11: 2001-07-15 4 120
#12: 2001-12-21 3 172
Or as @Frank mentioned in the comments, we can also use shift
(from version v1.9.5+
onwards) to get the lag
(by default, the type='lag'
) of 'date' and subtract from the 'date'.
setDT(df)[, days_since_last_event := as.numeric(date-shift(date,type="lag")),
by = event_type]
The base R version of this is to use split/lapply/rbind to generate the new column.
> do.call(rbind,
lapply(
split(df, df$event_type),
function(d) {
d$dsle <- c(NA, diff(d$date)); d
}
)
)
date event_type dsle
0.1 2000-07-06 0 NA
0.7 2001-05-26 0 324
1.3 2000-10-15 1 NA
1.6 2001-04-23 1 190
2.4 2001-01-03 2 NA
2.8 2001-06-01 2 149
3.9 2001-06-30 3 NA
3.10 2001-07-02 3 2
3.12 2001-12-21 3 172
4.2 2000-09-15 4 NA
4.5 2001-03-17 4 183
4.11 2001-07-15 4 120
Note that this returns the data in a different order than provided; you can re-sort by date or save the original indices if you want to preserve that order.
Above, @akrun has posted the data.tables
approach, the parallel dplyr
approach would be straightforward as well:
library(dplyr)
df %>% group_by(event_type) %>% mutate(days_since_last_event=date - lag(date, 1))
Source: local data frame [12 x 3] Groups: event_type [5]
date event_type days_since_last_event
(date) (dbl) (dfft)
1 2000-07-06 0 NA days
2 2000-09-15 4 NA days
3 2000-10-15 1 NA days
4 2001-01-03 2 NA days
5 2001-03-17 4 183 days
6 2001-04-23 1 190 days
7 2001-05-26 0 324 days
8 2001-06-01 2 149 days
9 2001-06-30 3 NA days
10 2001-07-02 3 2 days
11 2001-07-15 4 120 days
12 2001-12-21 3 172 days
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With