Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Weighted Moving Average by date R

Tags:

r

Below is a script to take a persons stats and do a rolling average by the last 6 days. I would like to have the closest dates to today have more of an impact than later dates.

If it is possible:

  • the closest 2 events by date will have a weight of .50 (50%)
  • the 2nd closest dates will have a weight of .30 (30%)
  • the furthest would have a weight of .20 (20%).

There are two ways of creating the rolling average below one_df and two_df, I use the first one in my actual script but I added the second in case that would be easier the write in the weight functions.

library(dplyr)
library(lubridate)

# Create DataFrame


df<- data.frame(name=c('CAREY.FAKE','CAREY.FAKE','CAREY.FAKE','CAREY.FAKE','CAREY.FAKE','CAREY.FAKE','CAREY.FAKE',
                       'JOHN.SMITH','JOHN.SMITH','JOHN.SMITH','JOHN.SMITH','JOHN.SMITH','JOHN.SMITH','JOHN.SMITH',
                       'JEFF.JOHNSON','JEFF.JOHNSON','JEFF.JOHNSON','JEFF.JOHNSON','JEFF.JOHNSON','JEFF.JOHNSON','JEFF.JOHNSON',
                       'SARA.JOHNSON','SARA.JOHNSON','SARA.JOHNSON','SARA.JOHNSON','SARA.JOHNSON','SARA.JOHNSON','SARA.JOHNSON'
),
GA=c(3,2,1,1,2,3,20,3,2,1,1,2,3,20,3,2,1,1,2,3,20,3,2,1,1,2,3,20),
SV=c(3,2,1,1,2,3,20,3,2,1,1,2,3,20,3,2,1,1,2,3,20,3,2,1,1,2,3,20),
GF=c(3,2,1,1,2,3,20,3,2,1,1,2,3,20,3,2,1,1,2,3,20,3,2,1,1,2,3,20),
SA=c(3,2,1,1,2,3,20,3,2,1,1,2,3,20,3,2,1,1,2,3,20,3,2,1,1,2,3,20),
date=c("10/20/2016","10/19/2016","10/18/2016","10/17/2016","10/16/2016","10/15/2016","10/14/2016",
       "10/20/2016","10/19/2016","10/18/2016","10/17/2016","10/16/2016","10/15/2016","10/14/2016",
       "10/20/2016","10/19/2016","10/18/2016","10/17/2016","10/16/2016","10/15/2016","10/14/2016",
       "10/20/2016","10/19/2016","10/18/2016","10/17/2016","10/16/2016","10/15/2016","10/14/2016"
),
stringsAsFactors = FALSE)

one_df <- df %>%
  group_by(name) %>%
  arrange(name, mdy(date)) %>% 
  summarise_at(2:5, funs(mean(tail(., 6))))

two_df <- df %>% 
  group_by(name) %>%
  top_n(mdy(date), n = 6) %>%
  summarise_at(2:5, mean)

DF:

    name        GA  SV  GF  SA  date
CAREY.FAKE      3   3   3   3   10/20/2016
CAREY.FAKE      2   2   2   2   10/19/2016
CAREY.FAKE      1   1   1   1   10/18/2016
CAREY.FAKE      1   1   1   1   10/17/2016
CAREY.FAKE      2   2   2   2   10/16/2016
CAREY.FAKE      3   3   3   3   10/15/2016
CAREY.FAKE      20  20  20  20  10/14/2016
JOHN.SMITH      3   3   3   3   10/20/2016
JOHN.SMITH      2   2   2   2   10/19/2016
JOHN.SMITH      1   1   1   1   10/18/2016
JOHN.SMITH      1   1   1   1   10/17/2016
JOHN.SMITH      2   2   2   2   10/16/2016
JOHN.SMITH      3   3   3   3   10/15/2016
JOHN.SMITH      20  20  20  20  10/14/2016
JEFF.JOHNSON    3   3   3   3   10/20/2016
JEFF.JOHNSON    2   2   2   2   10/19/2016
JEFF.JOHNSON    1   1   1   1   10/18/2016
JEFF.JOHNSON    1   1   1   1   10/17/2016
JEFF.JOHNSON    2   2   2   2   10/16/2016
JEFF.JOHNSON    3   3   3   3   10/15/2016
JEFF.JOHNSON    20  20  20  20  10/14/2016
SARA.JOHNSON    3   3   3   3   10/20/2016
SARA.JOHNSON    2   2   2   2   10/19/2016
SARA.JOHNSON    1   1   1   1   10/18/2016
SARA.JOHNSON    1   1   1   1   10/17/2016
SARA.JOHNSON    2   2   2   2   10/16/2016
SARA.JOHNSON    3   3   3   3   10/15/2016
SARA.JOHNSON    20  20  20  20  10/14/2016

RESULTS:

name            GA  SV  GF  SA
CAREY.FAKE      2   2   2   2
JEFF.JOHNSON    2   2   2   2
JOHN.SMITH      2   2   2   2
SARA.JOHNSON    2   2   2   2

EXPECTED RESULTS:

name             GA   SV    GF   SA
CAREY.FAKE      2.05 2.05  2.05 2.05
JEFF.JOHNSON    2.05 2.05  2.05 2.05
JOHN.SMITH      2.05 2.05  2.05 2.05
SARA.JOHNSON    2.05 2.05  2.05 2.05
like image 438
Michael T Johnson Avatar asked Oct 22 '25 06:10

Michael T Johnson


1 Answers

Getting your outcome can be done with the weighted moving average WMA function from the TTR package. The weights are applied to the records records selected for the period length (n = 6). The weights should be of the same length as the period.

library(dplyr)
library(lubridate)
library(purrr)

df %>% 
  group_by(name) %>%
  arrange(name, mdy(date)) %>% 
  mutate_at(2:5, TTR::WMA, n = 6, wts = c(.2, .2, .3, .3, .5, .5))

# A tibble: 28 x 6
# Groups:   name [4]
   name            GA    SV    GF    SA date      
   <chr>        <dbl> <dbl> <dbl> <dbl> <chr>     
 1 CAREY.FAKE   NA    NA    NA    NA    10/14/2016
 2 CAREY.FAKE   NA    NA    NA    NA    10/15/2016
 3 CAREY.FAKE   NA    NA    NA    NA    10/16/2016
 4 CAREY.FAKE   NA    NA    NA    NA    10/17/2016
 5 CAREY.FAKE   NA    NA    NA    NA    10/18/2016
 6 CAREY.FAKE    3.50  3.50  3.50  3.50 10/19/2016
 7 CAREY.FAKE    2.05  2.05  2.05  2.05 10/20/2016
 8 JEFF.JOHNSON NA    NA    NA    NA    10/14/2016
 9 JEFF.JOHNSON NA    NA    NA    NA    10/15/2016
10 JEFF.JOHNSON NA    NA    NA    NA    10/16/2016
# ... with 18 more rows

Or with the NA's filtered out:

df %>% 
  group_by(name) %>%
  arrange(name, mdy(date)) %>% 
  mutate_at(2:5, TTR::WMA, n = 6, wts = c(.2, .2, .3, .3, .5, .5)) %>% 
  filter(!is.na(GA))

# A tibble: 8 x 6
# Groups:   name [4]
  name            GA    SV    GF    SA date      
  <chr>        <dbl> <dbl> <dbl> <dbl> <chr>     
1 CAREY.FAKE    3.50  3.50  3.50  3.50 10/19/2016
2 CAREY.FAKE    2.05  2.05  2.05  2.05 10/20/2016
3 JEFF.JOHNSON  3.50  3.50  3.50  3.50 10/19/2016
4 JEFF.JOHNSON  2.05  2.05  2.05  2.05 10/20/2016
5 JOHN.SMITH    3.50  3.50  3.50  3.50 10/19/2016
6 JOHN.SMITH    2.05  2.05  2.05  2.05 10/20/2016
7 SARA.JOHNSON  3.50  3.50  3.50  3.50 10/19/2016
8 SARA.JOHNSON  2.05  2.05  2.05  2.05 10/20/2016

EDIT:

If there are not enough values for the period window, we can create a function and wrap this in purrr's possible function to return NA's when the function fails. In the example below I removed 2 records from "CAREY.FAKE" to show the result.

my_func <- function(x){
  TTR::WMA(x, n = 6, wts = c(.2, .2, .3, .3, .5, .5))
}

df %>% 
  group_by(name) %>%
  arrange(name, mdy(date)) %>% 
  mutate_at(2:5, possibly(my_func, otherwise = NA_real_))

# A tibble: 26 x 6
# Groups:   name [4]
   name            GA    SV    GF    SA date      
   <chr>        <dbl> <dbl> <dbl> <dbl> <chr>     
 1 CAREY.FAKE      NA    NA    NA    NA 10/14/2016
 2 CAREY.FAKE      NA    NA    NA    NA 10/15/2016
 3 CAREY.FAKE      NA    NA    NA    NA 10/16/2016
 4 CAREY.FAKE      NA    NA    NA    NA 10/17/2016
 5 CAREY.FAKE      NA    NA    NA    NA 10/18/2016
 6 JEFF.JOHNSON    NA    NA    NA    NA 10/14/2016
 7 JEFF.JOHNSON    NA    NA    NA    NA 10/15/2016
 8 JEFF.JOHNSON    NA    NA    NA    NA 10/16/2016
 9 JEFF.JOHNSON    NA    NA    NA    NA 10/17/2016
10 JEFF.JOHNSON    NA    NA    NA    NA 10/18/2016
# ... with 16 more rows
like image 93
phiver Avatar answered Oct 25 '25 01:10

phiver