Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Subset dataframe where date is within x days of a vector of dates in R

I have a vector of dates e.g.

dates <- c('2013-01-01', '2013-04-02', '2013-06-10', '2013-09-30')

And a dataframe which contains a date column e.g.

df <- data.frame(
                'date' = c('2013-01-04', '2013-01-22', '2013-10-01', '2013-10-10'),
                'a'    = c(1,2,3,4),
                'b'    = c('a', 'b', 'c', 'd')
                )

And I would would like to subset the dataframe so it only contains rows where the date is less than 5 days after any of the dates in the 'dates' vector.

i.e. The initial dataframe looks like this

date       a b 
2013-01-04 1 a
2013-01-22 2 b
2013-10-01 3 c
2013-10-10 4 d

After the query I would only be left with the first and third row (since 2013-01-04 is within 5 days of 2013-01-01 and 2013-10-01 is within 5 days of 2013-09-30)

Does anyone know of the best way to do this?

Thanks in advance

like image 800
user1165199 Avatar asked Oct 07 '13 15:10

user1165199


2 Answers

This is easy (and very fast) to do with a data.table roll:

library(data.table)
dt = data.table(df)

# convert to Date (or IDate) to have numbers instead of strings for dates
# also set the key for dates for the join
dt[, date := as.Date(date)]
dates = data.table(date = as.Date(dates), key = 'date')

# join with a roll of 5 days, throwing out dates that don't match
dates[dt, roll = 5, nomatch = 0]
#         date a b
#1: 2013-01-04 1 a
#2: 2013-10-01 3 c
like image 135
eddi Avatar answered Nov 14 '22 23:11

eddi


broken down into steps:

# Rows Selected: Iterate over each row in the DF, 
#   and check if its `date` value is within 5 from any value in the `dates` vector
rows <- sapply(df$date, function(x) any( abs(x-dates) <=  5))

# Use that result to subset your data.frame
df[rows, ]

#         date a b
# 1 2013-01-04 1 a
# 3 2013-10-01 3 c

Importantly, make sure your date values are actual Dates and not characters looking like dates

dates <- as.Date(dates)
df$date <- as.Date(df$date)
like image 22
Ricardo Saporta Avatar answered Nov 14 '22 22:11

Ricardo Saporta