Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

r - Efficiently create variable indicating if date variable precedes event (by group)

I have two dates (date1 and date2) and an id variable in a data.frame:

dat <- data.frame(c('2014-02-11', '2014-05-04', '2014-05-22'), c('2014-04-12', '2014-09-22', '2014-07-04'), c('a', 'a', 'b'))
names(dat) <- c('date1', 'date2', 'id')
dat$date1 <- as.character.Date(dat$date1, format = '%Y-%m-%d')
dat$date2 <- as.character.Date(dat$date2, format = '%Y-%m-%d')
> dat
       date1      date2 id
1 2014-02-11 2014-04-12  a
2 2014-05-04 2014-09-22  a
3 2014-05-22 2014-07-04  b

I would like to create a new variable var that indicates whether any date2 date value precedes the date1 date value for that row (not simply the date2 value immediately preceding it):

> dat
       date1      date2 id var
1 2014-02-11 2014-04-12  a   0
2 2014-05-04 2014-09-22  a   1
3 2014-05-22 2014-07-04  b   0

I've been able to achieve this with the following loop:

ids <- as.vector(unique(unlist(dat$id)))
dat$var <- as.numeric(0)
for (i in ids) {
  date2s <- as.vector(unlist(filter(dat, id == i)$date2))
  for (j in date2s) {
    dat <- dat %>% mutate(var = replace(var, (j < date1) & (id == i), 1)) # if any cdate precedes rdate
  }
}

However, my data set is quite large, and I would like to achieve this using data.table if possible, though I'm happy to approach this with dplyr if there's an efficient approach.

like image 921
kathystehl Avatar asked Mar 02 '18 04:03

kathystehl


People also ask

How does R recognize dates?

Date objects in RDate objects are stored in R as integer values, allowing for dates to be compared and manipulated as you would a numeric vector. Logical comparisons are a simple. When referring to dates, earlier dates are “less than” later dates.

Is date a continuous variable in R?

Convert Date to an R Date ClassYou need to convert your date column, which is currently stored as a character to a date class that can be displayed as a continuous variable. Lucky for us, R has a date class. You can convert the date field to a date class using the function as. Date() .

How do you change the datatype of a date in R?

You can use the as. Date( ) function to convert character data to dates. The format is as. Date(x, "format"), where x is the character data and format gives the appropriate format.


1 Answers

A suggestion to use .EACHI as follows after a self-join as suggested by @thelatemail

dat[dat, .(date1=i.date1, date2=i.date2, var=any(date2 < i.date1)), by=.EACHI, on=.(id)]

#   id      date1      date2   var
#1:  a 2014-02-11 2014-04-12 FALSE
#2:  a 2014-05-04 2014-09-22  TRUE
#3:  b 2014-05-22 2014-07-04 FALSE

Edit: some timing for reference

set.seed(2L)
N <- 1e5
dat <- data.table(date1=sample(seq(as.Date("1970-01-01"), Sys.Date(), by="1 day"), N, replace=TRUE), 
    date2=sample(seq(as.Date("1970-01-01"), Sys.Date(), by="1 day"), N, replace=TRUE),
    id=sample(letters, N, replace=TRUE))

dt1 <- copy(dat)
tlmMtd <- function() {
    dt1[, rownum := .I]
    dt1[dt1[dt1, on="id", rownum[i.date2 < date1], allow.cartesian=TRUE], hit := 1]
}

dt2 <- copy(dat)
csMtd <- function() dt2[dt2, .(date1=i.date1, date2=i.date2, var=any(date2 < i.date1)), by=.EACHI, on=.(id)]


dt3 <- copy(dat)
frankMtd <- function() dt3[, v := .SD[copy(.SD), on=.(id, date2 < date1), .N, by=.EACHI]$N > 0L]

microbenchmark::microbenchmark(
    tlmMtd(),
    csMtd(),
    frankMtd(),
    times=5L)

# Unit: milliseconds
#       expr        min         lq       mean     median         uq       max neval
# tlmMtd()   18528.9799 18652.2217 23486.4213 19116.8014 21140.5923 39993.511     5
# csMtd()     3801.2146  3943.6201  4984.6274  5341.4322  5673.6878  6163.182     5
# frankMtd()   176.4477   177.5576   191.9636   178.9564   182.0311   244.825     5
like image 165
chinsoon12 Avatar answered Sep 17 '22 19:09

chinsoon12