I have two data frames, x and y. The data frame x has a range of dates while data frame y has individual dates. I want to get the sum of the individual date values for the time ranges in data frame x. Thus id "a" would have the sum of all the values from 2019/1/1 through 2019/3/1.
id <- c("a","b","c")
start_date <- as.Date(c("2019/1/1", "2019/2/1", "2019/3/1"))
end_date <- as.Date(c("2019/3/1", "2019/4/1", "2019/5/1"))
x <- data.frame(id, start_date, end_date)
dates <- seq(as.Date("2019/1/1"),as.Date("2019/5/1"),1)
values <- runif(121, min=0, max=7)
y <- data.frame(dates, values)
Desired output
id start_date end_date sum
a 2019/1/1 2019/3/1 221.8892
One base R option is using apply
x$sum <- apply(x, 1, function(v) sum(subset(y,dates >= v["start_date"] & dates<=v["end_date"])$values))
such that
> x
id start_date end_date sum
1 a 2019-01-01 2019-03-01 196.0311
2 b 2019-02-01 2019-04-01 185.6970
3 c 2019-03-01 2019-05-01 173.6429
Data
set.seed(1234)
id <- c("a","b","c")
start_date <- as.Date(c("2019/1/1", "2019/2/1", "2019/3/1"))
end_date <- as.Date(c("2019/3/1", "2019/4/1", "2019/5/1"))
x <- data.frame(id, start_date, end_date)
dates <- seq(as.Date("2019/1/1"),as.Date("2019/5/1"),1)
values <- runif(121, min=0, max=7)
y <- data.frame(dates, values)
There are many ways of doing this. One possibility would be:
library(data.table)
x <- setDT(x)
# create a complete series for each id
x <- x[, .(dates = seq(start_date, end_date, 1)), by=id]
# merge the data
m <- merge(x, y, by="dates")
# get the sums
m[, .(sum = sum(values)), by=id]
id sum
1: a 196.0311
2: b 185.6970
3: c 173.6429
You can add setseed before you create the random variables to exactly replicate the numbers
set.seed(1234)
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With