The solution to this question by @ShirinYavari was almost what I needed except for the use of the static averaging window width of 2. I have a dataset with random samples from multiple stations that I want to calculate a rolling 30-day geomean. I want all samples within a 30-day window of a given sample to be averaged and the width may change if preceding samples are farther or closer together in time, for instance whether you would need to average 2, 3, or more samples if 1, 2, or more preceding samples were within 30 days of a given sample.
Here is some example data, plus my code attempt:
RESULT = c(50,900,25,25,125,50,25,25,2000,25,25,
25,25,25,25,25,25,325,25,300,475,25)
DATE = as.Date(c("2018-05-23","2018-06-05","2018-06-17",
"2018-08-20","2018-10-05","2016-05-22",
"2016-06-20","2016-07-25","2016-08-11",
"2017-07-21","2017-08-08","2017-09-18",
"2017-10-12","2011-04-19","2011-06-29",
"2011-08-24","2011-10-23","2012-06-28",
"2012-07-16","2012-08-14","2012-09-29",
"2012-10-24"))
FINAL_SITEID = c(rep("A", 5), rep("B", 8), rep("C", 9))
df=data.frame(FINAL_SITEID,DATE,RESULT)
data_roll <- df %>%
group_by(FINAL_SITEID) %>%
arrange(DATE) %>%
mutate(day=DATE-dplyr::lag(DATE, n=1),
day=replace_na(day, 1),
rnk=cumsum(c(TRUE, day > 30))) %>%
group_by(FINAL_SITEID, rnk) %>%
mutate(count=rowid(rnk)) %>%
mutate(GM30=rollapply(RESULT, width=count, geometric.mean, fill=RESULT, align="right"))
I get this error message, which seems like it should be an easy fix, but I can't figure it out:
Error: Column `rnk` must be length 5 (the group size) or one, not 6
Easiest way to compute rolling statistics depending on datetime windows is runner package. You don't have to hack around to get just 30-days windows. Function runner allows you to apply any R function in rolling window. Below example of 30-days geometric.mean within FINAL_SITEID
group:
library(psych)
library(runner)
df %>%
group_by(FINAL_SITEID) %>%
arrange(DATE) %>%
mutate(GM30 = runner(RESULT, k = 30, idx = DATE, f = geometric.mean))
# FINAL_SITEID DATE RESULT GM30
# <fct> <date> <dbl> <dbl>
# 1 C 2011-04-19 25 25.0
# 2 C 2011-06-29 25 25.0
# 3 C 2011-08-24 25 25.0
# 4 C 2011-10-23 25 25.0
# 5 C 2012-06-28 325 325.
# 6 C 2012-07-16 25 90.1
# 7 C 2012-08-14 300 86.6
# 8 C 2012-09-29 475 475.
# 9 C 2012-10-24 25 109.
# 10 B 2016-05-22 50 50.0
The width argument of rollapply can be a vector of widths which can be set using findInterval
. An example of this is shown in the Examples section of the rollapply help file and we use that below.
library(dplyr)
library(psych)
library(zoo)
data_roll <- df %>%
arrange(FINAL_SITEID, DATE) %>%
group_by(FINAL_SITEID) %>%
mutate(GM30 = rollapplyr(RESULT, 1:n() - findInterval(DATE - 30, DATE),
geometric.mean, fill = NA)) %>%
ungroup
giving:
# A tibble: 22 x 4
FINAL_SITEID DATE RESULT GM30
<fct> <date> <dbl> <dbl>
1 A 2018-05-23 50 50.0
2 A 2018-06-05 900 212.
3 A 2018-06-17 25 104.
4 A 2018-08-20 25 25.0
5 A 2018-10-05 125 125.
6 B 2016-05-22 50 50.0
7 B 2016-06-20 25 35.4
8 B 2016-07-25 25 25.0
9 B 2016-08-11 2000 224.
10 B 2017-07-21 25 25.0
# ... with 12 more rows
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With