I asked a similar question before and got great help: R: Aggregating History By ID By Date
The difference was that for the former post I was interested in aggregating ALL historical information, but now I am hoping to specify only 90 days back.
Here is an example of how my data might look:
strDates <- c("09/09/16", "5/7/16", "5/6/16", "2/13/16", "2/11/16","1/7/16",
"11/8/16","6/8/16", "5/8/16","2/13/16","1/3/16", "1/1/16")
Date<-as.Date(strDates, "%m/%d/%y")
ID <- c("A", "A", "A", "A","A", "A", "B","B","B","B","B", "B")
Event <- c(1,0,1,0,1,1, 0,1,1,1,0, 1)
sample_df <- data.frame(Date,ID,Event)
As well as the output:
Background Information
I want to keep all attached information per encounter, but then aggregate the following historical information by id back to 90 days.
Example
As an example, let's look at Row 2.
Row 2 is ID A, so I would reference Rows 3-6 (which occurred prior to Row 2 Encounter). Within this group of rows, we see that Rows 3,4,5 and all occurred within the last 90 days, with Row 6 happening outside of the time of interest.
Number of Previous Encounters in last 90 days from Row 2: 3 Encounters
Number of Previous Events in last 90 days from Row 2: 2 Events (5/6/16 and 2/11/16)
Desired Output
Ideally, I would get the following output:
Here's an alternative data.table
solution that should be very efficient. This utilizes the new non-equi joins that were introduced in v 1.10.0 combined with by = .EACHI
which allows you do calculations per join while joining
library(data.table) #v1.10.0
setDT(sample_df)[, Date2 := Date - 90] # Set range (Maybe in future this could be avoided)
sample_df[sample_df, # Binary join with itself
.(Enc90D = .N, Ev90D = sum(Event, na.rm = TRUE)), # Make calculations
on = .(ID = ID, Date < Date, Date > Date2), # Join by
by = .EACHI] # Do calculations per each match
# ID Date Date Enc90D Ev90D
# 1: A 2016-09-09 2016-06-11 0 0
# 2: A 2016-05-07 2016-02-07 3 2
# 3: A 2016-05-06 2016-02-06 2 1
# 4: A 2016-02-13 2015-11-15 2 2
# 5: A 2016-02-11 2015-11-13 1 1
# 6: A 2016-01-07 2015-10-09 0 0
# 7: B 2016-11-08 2016-08-10 0 0
# 8: B 2016-06-08 2016-03-10 1 1
# 9: B 2016-05-08 2016-02-08 1 1
# 10: B 2016-02-13 2015-11-15 2 1
# 11: B 2016-01-03 2015-10-05 1 1
# 12: B 2016-01-01 2015-10-03 0 0
A partially vectorized dplyr
solution, where you can combine do
(to loop through groups) and rowwise
operation (so that you can refer Date as the Date at each row, and .$Date
as the whole Date
column within each group):
sample_df %>%
group_by(ID) %>%
do(rowwise(.) %>%
mutate(PrevEnc90D = sum(Date - .$Date < 90 & Date - .$Date > 0),
PrevEvent90D = sum(.$Event[Date - .$Date < 90 & Date - .$Date > 0])))
#Source: local data frame [12 x 5]
#Groups: ID [2]
# Date ID Event PrevEnc90D PrevEvent90D
# <date> <fctr> <dbl> <int> <dbl>
#1 2016-09-09 A 1 0 0
#2 2016-05-07 A 0 3 2
#3 2016-05-06 A 1 2 1
#4 2016-02-13 A 0 2 2
#5 2016-02-11 A 1 1 1
#6 2016-01-07 A 1 0 0
#7 2016-11-08 B 0 0 0
#8 2016-06-08 B 1 1 1
#9 2016-05-08 B 1 1 1
#10 2016-02-13 B 1 2 1
#11 2016-01-03 B 0 1 1
#12 2016-01-01 B 1 0 0
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With