Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

rolling 30-day geometric mean with variable width

The solution to this question by @ShirinYavari was almost what I needed except for the use of the static averaging window width of 2. I have a dataset with random samples from multiple stations that I want to calculate a rolling 30-day geomean. I want all samples within a 30-day window of a given sample to be averaged and the width may change if preceding samples are farther or closer together in time, for instance whether you would need to average 2, 3, or more samples if 1, 2, or more preceding samples were within 30 days of a given sample.

Here is some example data, plus my code attempt:

RESULT = c(50,900,25,25,125,50,25,25,2000,25,25,
        25,25,25,25,25,25,325,25,300,475,25)
DATE = as.Date(c("2018-05-23","2018-06-05","2018-06-17",
                  "2018-08-20","2018-10-05","2016-05-22",
                  "2016-06-20","2016-07-25","2016-08-11",
                  "2017-07-21","2017-08-08","2017-09-18",
                  "2017-10-12","2011-04-19","2011-06-29",
                  "2011-08-24","2011-10-23","2012-06-28",
                  "2012-07-16","2012-08-14","2012-09-29",
                  "2012-10-24"))
FINAL_SITEID = c(rep("A", 5), rep("B", 8), rep("C", 9))
df=data.frame(FINAL_SITEID,DATE,RESULT)

data_roll <- df %>%
  group_by(FINAL_SITEID) %>%
  arrange(DATE) %>%
  mutate(day=DATE-dplyr::lag(DATE, n=1),
         day=replace_na(day, 1),
         rnk=cumsum(c(TRUE, day > 30))) %>%
  group_by(FINAL_SITEID, rnk) %>%
  mutate(count=rowid(rnk)) %>%
  mutate(GM30=rollapply(RESULT, width=count, geometric.mean, fill=RESULT, align="right"))

I get this error message, which seems like it should be an easy fix, but I can't figure it out:

Error: Column `rnk` must be length 5 (the group size) or one, not 6
like image 536
Laura Diemer Avatar asked Feb 08 '20 11:02

Laura Diemer


2 Answers

Easiest way to compute rolling statistics depending on datetime windows is runner package. You don't have to hack around to get just 30-days windows. Function runner allows you to apply any R function in rolling window. Below example of 30-days geometric.mean within FINAL_SITEID group:

library(psych)
library(runner)
df %>%
  group_by(FINAL_SITEID) %>%
  arrange(DATE) %>%
  mutate(GM30 = runner(RESULT, k = 30, idx = DATE, f = geometric.mean))

#     FINAL_SITEID DATE       RESULT  GM30
#    <fct>        <date>      <dbl> <dbl>
# 1 C            2011-04-19     25  25.0
# 2 C            2011-06-29     25  25.0
# 3 C            2011-08-24     25  25.0
# 4 C            2011-10-23     25  25.0
# 5 C            2012-06-28    325 325. 
# 6 C            2012-07-16     25  90.1
# 7 C            2012-08-14    300  86.6
# 8 C            2012-09-29    475 475. 
# 9 C            2012-10-24     25 109. 
# 10 B            2016-05-22     50  50.0
like image 198
GoGonzo Avatar answered Sep 30 '22 01:09

GoGonzo


The width argument of rollapply can be a vector of widths which can be set using findInterval. An example of this is shown in the Examples section of the rollapply help file and we use that below.

library(dplyr)
library(psych)
library(zoo)

data_roll <- df %>%
  arrange(FINAL_SITEID, DATE) %>%
  group_by(FINAL_SITEID) %>%
  mutate(GM30 = rollapplyr(RESULT, 1:n() - findInterval(DATE - 30, DATE), 
   geometric.mean, fill = NA)) %>%
  ungroup

giving:

# A tibble: 22 x 4
   FINAL_SITEID DATE       RESULT  GM30
   <fct>        <date>      <dbl> <dbl>
 1 A            2018-05-23     50  50.0
 2 A            2018-06-05    900 212. 
 3 A            2018-06-17     25 104. 
 4 A            2018-08-20     25  25.0
 5 A            2018-10-05    125 125. 
 6 B            2016-05-22     50  50.0
 7 B            2016-06-20     25  35.4
 8 B            2016-07-25     25  25.0
 9 B            2016-08-11   2000 224. 
10 B            2017-07-21     25  25.0
# ... with 12 more rows
like image 38
G. Grothendieck Avatar answered Sep 30 '22 00:09

G. Grothendieck