I have two data frames "Conc" and "Flow".
Flow has a value for every day for a set period, whereas Conc only has a value on certain days over the period of the period.
What I want to be able to do is calculate the mean Flow values for each period in between the Conc values using r.
The following code will generate two example data frames to illustrate the kind of datasets I am working with:
Conc <- data.frame(Date = as.Date(c("2012/01/13", "2012/02/16", "2012/05/02", "2012/07/28",
"2012/11/10")), Conc = c(0.88, 0.55, 0.34, 0.21, 0.98))
Flow <- data.frame(Date = c(seq(as.Date("2012/01/01"), by = "day", length.out = 365)),
Flow = c(sample(seq(from = 0.01, to = 5, by = 0.1), size = 365, replace = TRUE)))
The output data frame would ideally be something like:
Period Mean_Flow
1 2.01
2 1.41
3 3.81
4 0.31
I appreciate the variable time between Conc days makes this tricky. At present the best I have been to come up with is to manually do this in excel but I would really like to find an R solution to save myself having to do this for about 10 different dataset that I have.
Thank you.
Here's a possible approach using data.table
package foverlaps
function:
Create time intervals in both data sets
library(data.table)
Conc <- setDT(Conc)[, `:=`(start = Date, end = c(Date[2:(.N - 1)] - 1, Date[.N], NA))][-.N]
Flow <- setDT(Flow)[, `:=`(start = Date, end = Date)]
Key the Flow
data set in order to use foverlaps
function and run the function
setkey(Flow, start, end)
overlaps <- foverlaps(Conc, Flow, type = "any", which = TRUE)
Create indexes of the overlaps within the Flow
data set and compute the mean by those indexes
Flow[overlaps$yid, Period := overlaps$xid]
na.omit(Flow[, list(Mean_Flow = mean(Flow)), by = Period])
# Period Mean_Flow
# 1: 1 2.189412
# 2: 2 2.263947
# 3: 3 2.762874
# 4: 4 2.349048
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With