I am working with a multi-year dataset that has columns for date (%Y-%m-%d) and daily values for several variables.
In R, how do I subset the data by a date range (i.e., June 29 +/- 5 days) but capture the data from all years?
DATE A B C
1996-06-10 12:00:00 178.0 24.1 1.7
1996-06-11 12:00:00 184.1 30.2 1.1
1996-06-12 12:00:00 187.2 29.4 1.8
1996-06-13 12:00:00 194.4 35.0 5.3
1996-06-14 12:00:00 200.3 35.9 1.5
1996-06-15 12:00:00 138.9 15.1 0.0
...
Let yrs be all unique years in the data and targets be each of those years with the target's month and day. Then create dates which contains all dates within delta days of any value in targets. Note that sapply strips dates of its "Date" class but that does not matter since it is only subsequently used in %in% and that ignores the class. Finally subset DF down to those rows whose DATE is in dates. No packages are used.
# inputs (also DF defined in Note at end)
target <- "06-19"
delta <- 5
DATE <- as.Date(DF$DATE)
yrs <- unique(format(DATE, "%Y"))
targets <- as.Date(paste(yrs, target, sep = "-"))
dates <- c(sapply(targets, "+", seq(-delta, delta)))
DF[DATE %in% dates, ]
giving:
DATE A B C
5 1996-06-14 12:00:00 200.3 35.9 1.5
6 1996-06-15 12:00:00 138.9 15.1 0.0
Alternately, this can be done using a single SQL statement. Note that we assume that the DATE column is character since the question referred to it being in a particular format. Now, using the same inputs the inner select generates target dates from each year and then the outer select joins DF to those rows within delta days of any target date. We use the H2 database backend here since it has better date support than SQLite.
library(sqldf)
library(RH2)
# inputs (also DF defined in Note at end)
target <- "06-19"
delta <- 5
fn$sqldf("select DF.* from DF
join (select distinct cast(substr(DATE, 1, 4) || '-' || '$target' as DATE) as target
from DF)
on cast(substr(DATE, 1, 10) as DATE) between target - $delta and target + $delta")
giving:
DATE A B C
1 1996-06-14 12:00:00 200.3 35.9 1.5
2 1996-06-15 12:00:00 138.9 15.1 0.0
We could simplify the SQL somewhat if DATE is of R's "Date" class. That is, replace the sqldf statement above with:
DF2 <- transform(DF, DATE = as.Date(DATE))
fn$sqldf("select DF2.* from DF2
join (select distinct cast(year(DATE) || '-' || '$target' as DATE) as target from DF2)
on DATE between target - $delta and target + $delta")
giving:
DATE A B C
1 1996-06-14 200.3 35.9 1.5
2 1996-06-15 138.9 15.1 0.0
The input DF is assumed to be:
DF <- structure(list(DATE = c("1996-06-10 12:00:00", "1996-06-11 12:00:00",
"1996-06-12 12:00:00", "1996-06-13 12:00:00", "1996-06-14 12:00:00",
"1996-06-15 12:00:00"), A = c(178, 184.1, 187.2, 194.4, 200.3,
138.9), B = c(24.1, 30.2, 29.4, 35, 35.9, 15.1), C = c(1.7, 1.1,
1.8, 5.3, 1.5, 0)), .Names = c("DATE", "A", "B", "C"), row.names = c(NA,
-6L), class = "data.frame")
A base R attempt.
Stealing the example data from the other answer by Kevin:
df <- data.frame(
my_date = seq.Date(as.Date("1990-01-01"), as.Date("1999-12-31"), by = 1),
x = rnorm(3652),
y = rnorm(3652),
z = rnorm(3652)
)
Set your variables for the selection:
month_num <- 6
day_num <- 29
bound <- 5
Find the key dates in your range of years:
keydates <- as.Date(sprintf(
"%d-%02d-%02d",
do.call(seq, as.list(as.numeric(range(format(df$my_date, "%Y"))))),
month_num,
day_num
))
Make a selection:
out <- df[df$my_date %in% outer(keydates, -bound:bound, `+`),]
Check that it worked:
table(format(out$my_date, "%m-%d"))
#06-24 06-25 06-26 06-27 06-28 06-29 06-30 07-01 07-02 07-03 07-04
# 10 10 10 10 10 10 10 10 10 10 10
One valid value for each day/month for each year 1990 to 1999, centred on "06-29" with a range of 5 days either side
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With