Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

R - Calculate Time Elapsed Since Last Event with Multiple Event Types

Tags:

time

r

I have a dataframe that contains the dates of multiple types of events.

df <- data.frame(date=as.Date(c("06/07/2000","15/09/2000","15/10/2000"
                            ,"03/01/2001","17/03/2001","23/04/2001",
                            "26/05/2001","01/06/2001",
                            "30/06/2001","02/07/2001","15/07/2001"
                            ,"21/12/2001"), "%d/%m/%Y"), 
             event_type=c(0,4,1,2,4,1,0,2,3,3,4,3))

   date                event_type
   ----------------    ----------
1  2000-07-06          0
2  2000-09-15          4
3  2000-10-15          1
4  2001-01-03          2
5  2001-03-17          4
6  2001-04-23          1
7  2001-05-26          0
8  2001-06-01          2
9  2001-06-30          3
10 2001-07-02          3
11 2001-07-15          4
12 2001-12-21          3

I am trying to calculate the days between each event type so the output looks like the below:

   date                event_type          days_since_last_event
   ----------------    ----------          ---------------------
1  2000-07-06          0                    NA
2  2000-09-15          4                    NA
3  2000-10-15          1                    NA
4  2001-01-03          2                    NA
5  2001-03-17          4                   183
6  2001-04-23          1                   190
7  2001-05-26          0                   324
8  2001-06-01          2                   149
9  2001-06-30          3                    NA
10 2001-07-02          3                     2
11 2001-07-15          4                   120
12 2001-12-21          3                   172

I have benefited from the answers from these two previous posts but have not been able to address my specific problem in R; multiple event types.

Calculate elapsed time since last event

Calculate days since last event in R

Below is as far as I have gotten. I have not been able to leverage the last event index to calculate the last event date.

df <- cbind(df, as.vector(data.frame(count=ave(df$event_type==df$event_type,
                                           df$event_type, FUN=cumsum))))
df <- rename(df, c("count" = "last_event_index"))

   date                event_type      last_event_index
   ---------------     -------------   ----------------
1  2000-07-06          0                1
2  2000-09-15          4                1
3  2000-10-15          1                1
4  2001-01-03          2                1
5  2001-03-17          4                2
6  2001-04-23          1                2
7  2001-05-26          0                2
8  2001-06-01          2                2
9  2001-06-30          3                1
10 2001-07-02          3                2
11 2001-07-15          4                3
12 2001-12-21          3                3
like image 416
BeeGee Avatar asked Sep 22 '15 16:09

BeeGee


2 Answers

We can use diff to get the difference between adjacent 'date' after grouping by 'event_type'. Here, I am using data.table approach by converting the 'data.frame' to 'data.table' (setDT(df)), grouped by 'event_type', we get the diff of 'date'.

library(data.table)
setDT(df)[,days_since_last_event :=c(NA,diff(date)) , by = event_type]
df
#          date event_type days_since_last_event
# 1: 2000-07-06          0                    NA
# 2: 2000-09-15          4                    NA
# 3: 2000-10-15          1                    NA
# 4: 2001-01-03          2                    NA
# 5: 2001-03-17          4                   183
# 6: 2001-04-23          1                   190
# 7: 2001-05-26          0                   324
# 8: 2001-06-01          2                   149
# 9: 2001-06-30          3                    NA
#10: 2001-07-02          3                     2
#11: 2001-07-15          4                   120
#12: 2001-12-21          3                   172

Or as @Frank mentioned in the comments, we can also use shift (from version v1.9.5+ onwards) to get the lag (by default, the type='lag') of 'date' and subtract from the 'date'.

setDT(df)[, days_since_last_event := as.numeric(date-shift(date,type="lag")), 
                                  by = event_type]
like image 173
akrun Avatar answered Nov 14 '22 23:11

akrun


The base R version of this is to use split/lapply/rbind to generate the new column.

> do.call(rbind,
    lapply(
      split(df, df$event_type),
      function(d) {
        d$dsle <- c(NA, diff(d$date)); d
      }
    )
  )
           date event_type dsle
0.1  2000-07-06          0   NA
0.7  2001-05-26          0  324
1.3  2000-10-15          1   NA
1.6  2001-04-23          1  190
2.4  2001-01-03          2   NA
2.8  2001-06-01          2  149
3.9  2001-06-30          3   NA
3.10 2001-07-02          3    2
3.12 2001-12-21          3  172
4.2  2000-09-15          4   NA
4.5  2001-03-17          4  183
4.11 2001-07-15          4  120

Note that this returns the data in a different order than provided; you can re-sort by date or save the original indices if you want to preserve that order.

Above, @akrun has posted the data.tables approach, the parallel dplyr approach would be straightforward as well:

library(dplyr)
df %>% group_by(event_type) %>% mutate(days_since_last_event=date - lag(date, 1))

Source: local data frame [12 x 3] Groups: event_type [5]

         date event_type days_since_last_event
       (date)      (dbl)                (dfft)
1  2000-07-06          0               NA days
2  2000-09-15          4               NA days
3  2000-10-15          1               NA days
4  2001-01-03          2               NA days
5  2001-03-17          4              183 days
6  2001-04-23          1              190 days
7  2001-05-26          0              324 days
8  2001-06-01          2              149 days
9  2001-06-30          3               NA days
10 2001-07-02          3                2 days
11 2001-07-15          4              120 days
12 2001-12-21          3              172 days
like image 39
user295691 Avatar answered Nov 14 '22 23:11

user295691