Count the number of unique dates per ID between 2 dates, conditionally

Question

I have a main table that contains the dates of main events for each personid:

dfMain <- data.frame(last    = c("2017-08-01", "2017-08-01", "2017-08-05","2017-09-02","2017-09-02"),
                 previous    = c(NA, NA, "2017-08-01", "2017-08-05", "2017-08-01"),
                 personid    = c(12341, 122345, 12341, 12341, 122345),
                 diff        = c(NA, NA, 4, 28, 32))

(The NAs on the "previous" and "diff" variables indicate that this personid had his first "main even" i.e: no previous dates and no time difference)

I also have a secondary table that consists of a "secondary event" for each personid:

dfSecondary <- data.frame(date = c("2017-09-01", "2017-08-30", "2017-08-04", "2017-08-02", "2017-08-02"),
                      personid = c(122345, 122345, 12341, 122345, 12341))

My question is, What is the optimal way (due to the volume of my data) to augment my "dfMain" data frame with the number of unique secondary events between main event dates for each personid.

In the dummy example my objective is to get this table:

Occurances  <- c(NA, NA, 2, 0, 3)
dfObjective <- data.frame(dfMain, Occurances)

Jaap · Accepted Answer

Using the data.table-package:

# load 'data.table' package and convert date-columns to date-class
library(data.table)
setDT(dfMain)[, 1:2 := lapply(.SD, as.IDate), .SDcols = 1:2][]
setDT(dfSecondary)[, date := as.IDate(date)][]

# create a reference
dfSecondary <- dfSecondary[dfMain
                           , on = .(personid, date > previous, date < last)
                           , .(dates = x.date)
                           , by = .EACHI]
setnames(dfSecondary, 2:3, c('previous','last'))

# join and summarise
dfMain[na.omit(dfSecondary, cols = 1:3)[, sum(!is.na(dates), na.rm = TRUE)
                                        , by = .(personid, previous, last)]
       , on = .(personid, previous, last)
       , Occ := V1][]

which gives:

         last   previous personid diff Occ
1: 2017-08-01       <NA>    12341   NA  NA
2: 2017-08-01       <NA>   122345   NA  NA
3: 2017-08-05 2017-08-01    12341    4   2
4: 2017-09-02 2017-08-05    12341   28   0
5: 2017-09-02 2017-08-01   122345   32   3

Count the number of unique dates per ID between 2 dates, conditionally

Tags:

date

dataframe

r

Vasilis Vasileiou

1 Answers

Jaap

Recent Activity

Donate For Us

Count the number of unique dates per ID between 2 dates, conditionally

Tags:

date

dataframe

r

Vasilis Vasileiou

1 Answers

Jaap

Related questions

Recent Activity

Donate For Us