Below is a script to take a persons stats and do a rolling average by the last 6 days. I would like to have the closest dates to today have more of an impact than later dates.
If it is possible:
There are two ways of creating the rolling average below one_df and two_df, I use the first one in my actual script but I added the second in case that would be easier the write in the weight functions.
library(dplyr)
library(lubridate)
# Create DataFrame
df<- data.frame(name=c('CAREY.FAKE','CAREY.FAKE','CAREY.FAKE','CAREY.FAKE','CAREY.FAKE','CAREY.FAKE','CAREY.FAKE',
'JOHN.SMITH','JOHN.SMITH','JOHN.SMITH','JOHN.SMITH','JOHN.SMITH','JOHN.SMITH','JOHN.SMITH',
'JEFF.JOHNSON','JEFF.JOHNSON','JEFF.JOHNSON','JEFF.JOHNSON','JEFF.JOHNSON','JEFF.JOHNSON','JEFF.JOHNSON',
'SARA.JOHNSON','SARA.JOHNSON','SARA.JOHNSON','SARA.JOHNSON','SARA.JOHNSON','SARA.JOHNSON','SARA.JOHNSON'
),
GA=c(3,2,1,1,2,3,20,3,2,1,1,2,3,20,3,2,1,1,2,3,20,3,2,1,1,2,3,20),
SV=c(3,2,1,1,2,3,20,3,2,1,1,2,3,20,3,2,1,1,2,3,20,3,2,1,1,2,3,20),
GF=c(3,2,1,1,2,3,20,3,2,1,1,2,3,20,3,2,1,1,2,3,20,3,2,1,1,2,3,20),
SA=c(3,2,1,1,2,3,20,3,2,1,1,2,3,20,3,2,1,1,2,3,20,3,2,1,1,2,3,20),
date=c("10/20/2016","10/19/2016","10/18/2016","10/17/2016","10/16/2016","10/15/2016","10/14/2016",
"10/20/2016","10/19/2016","10/18/2016","10/17/2016","10/16/2016","10/15/2016","10/14/2016",
"10/20/2016","10/19/2016","10/18/2016","10/17/2016","10/16/2016","10/15/2016","10/14/2016",
"10/20/2016","10/19/2016","10/18/2016","10/17/2016","10/16/2016","10/15/2016","10/14/2016"
),
stringsAsFactors = FALSE)
one_df <- df %>%
group_by(name) %>%
arrange(name, mdy(date)) %>%
summarise_at(2:5, funs(mean(tail(., 6))))
two_df <- df %>%
group_by(name) %>%
top_n(mdy(date), n = 6) %>%
summarise_at(2:5, mean)
DF:
name GA SV GF SA date
CAREY.FAKE 3 3 3 3 10/20/2016
CAREY.FAKE 2 2 2 2 10/19/2016
CAREY.FAKE 1 1 1 1 10/18/2016
CAREY.FAKE 1 1 1 1 10/17/2016
CAREY.FAKE 2 2 2 2 10/16/2016
CAREY.FAKE 3 3 3 3 10/15/2016
CAREY.FAKE 20 20 20 20 10/14/2016
JOHN.SMITH 3 3 3 3 10/20/2016
JOHN.SMITH 2 2 2 2 10/19/2016
JOHN.SMITH 1 1 1 1 10/18/2016
JOHN.SMITH 1 1 1 1 10/17/2016
JOHN.SMITH 2 2 2 2 10/16/2016
JOHN.SMITH 3 3 3 3 10/15/2016
JOHN.SMITH 20 20 20 20 10/14/2016
JEFF.JOHNSON 3 3 3 3 10/20/2016
JEFF.JOHNSON 2 2 2 2 10/19/2016
JEFF.JOHNSON 1 1 1 1 10/18/2016
JEFF.JOHNSON 1 1 1 1 10/17/2016
JEFF.JOHNSON 2 2 2 2 10/16/2016
JEFF.JOHNSON 3 3 3 3 10/15/2016
JEFF.JOHNSON 20 20 20 20 10/14/2016
SARA.JOHNSON 3 3 3 3 10/20/2016
SARA.JOHNSON 2 2 2 2 10/19/2016
SARA.JOHNSON 1 1 1 1 10/18/2016
SARA.JOHNSON 1 1 1 1 10/17/2016
SARA.JOHNSON 2 2 2 2 10/16/2016
SARA.JOHNSON 3 3 3 3 10/15/2016
SARA.JOHNSON 20 20 20 20 10/14/2016
RESULTS:
name GA SV GF SA
CAREY.FAKE 2 2 2 2
JEFF.JOHNSON 2 2 2 2
JOHN.SMITH 2 2 2 2
SARA.JOHNSON 2 2 2 2
EXPECTED RESULTS:
name GA SV GF SA
CAREY.FAKE 2.05 2.05 2.05 2.05
JEFF.JOHNSON 2.05 2.05 2.05 2.05
JOHN.SMITH 2.05 2.05 2.05 2.05
SARA.JOHNSON 2.05 2.05 2.05 2.05
Getting your outcome can be done with the weighted moving average WMA function from the TTR package. The weights are applied to the records records selected for the period length (n = 6). The weights should be of the same length as the period.
library(dplyr)
library(lubridate)
library(purrr)
df %>%
group_by(name) %>%
arrange(name, mdy(date)) %>%
mutate_at(2:5, TTR::WMA, n = 6, wts = c(.2, .2, .3, .3, .5, .5))
# A tibble: 28 x 6
# Groups: name [4]
name GA SV GF SA date
<chr> <dbl> <dbl> <dbl> <dbl> <chr>
1 CAREY.FAKE NA NA NA NA 10/14/2016
2 CAREY.FAKE NA NA NA NA 10/15/2016
3 CAREY.FAKE NA NA NA NA 10/16/2016
4 CAREY.FAKE NA NA NA NA 10/17/2016
5 CAREY.FAKE NA NA NA NA 10/18/2016
6 CAREY.FAKE 3.50 3.50 3.50 3.50 10/19/2016
7 CAREY.FAKE 2.05 2.05 2.05 2.05 10/20/2016
8 JEFF.JOHNSON NA NA NA NA 10/14/2016
9 JEFF.JOHNSON NA NA NA NA 10/15/2016
10 JEFF.JOHNSON NA NA NA NA 10/16/2016
# ... with 18 more rows
Or with the NA's filtered out:
df %>%
group_by(name) %>%
arrange(name, mdy(date)) %>%
mutate_at(2:5, TTR::WMA, n = 6, wts = c(.2, .2, .3, .3, .5, .5)) %>%
filter(!is.na(GA))
# A tibble: 8 x 6
# Groups: name [4]
name GA SV GF SA date
<chr> <dbl> <dbl> <dbl> <dbl> <chr>
1 CAREY.FAKE 3.50 3.50 3.50 3.50 10/19/2016
2 CAREY.FAKE 2.05 2.05 2.05 2.05 10/20/2016
3 JEFF.JOHNSON 3.50 3.50 3.50 3.50 10/19/2016
4 JEFF.JOHNSON 2.05 2.05 2.05 2.05 10/20/2016
5 JOHN.SMITH 3.50 3.50 3.50 3.50 10/19/2016
6 JOHN.SMITH 2.05 2.05 2.05 2.05 10/20/2016
7 SARA.JOHNSON 3.50 3.50 3.50 3.50 10/19/2016
8 SARA.JOHNSON 2.05 2.05 2.05 2.05 10/20/2016
If there are not enough values for the period window, we can create a function and wrap this in purrr's possible function to return NA's when the function fails. In the example below I removed 2 records from "CAREY.FAKE" to show the result.
my_func <- function(x){
TTR::WMA(x, n = 6, wts = c(.2, .2, .3, .3, .5, .5))
}
df %>%
group_by(name) %>%
arrange(name, mdy(date)) %>%
mutate_at(2:5, possibly(my_func, otherwise = NA_real_))
# A tibble: 26 x 6
# Groups: name [4]
name GA SV GF SA date
<chr> <dbl> <dbl> <dbl> <dbl> <chr>
1 CAREY.FAKE NA NA NA NA 10/14/2016
2 CAREY.FAKE NA NA NA NA 10/15/2016
3 CAREY.FAKE NA NA NA NA 10/16/2016
4 CAREY.FAKE NA NA NA NA 10/17/2016
5 CAREY.FAKE NA NA NA NA 10/18/2016
6 JEFF.JOHNSON NA NA NA NA 10/14/2016
7 JEFF.JOHNSON NA NA NA NA 10/15/2016
8 JEFF.JOHNSON NA NA NA NA 10/16/2016
9 JEFF.JOHNSON NA NA NA NA 10/17/2016
10 JEFF.JOHNSON NA NA NA NA 10/18/2016
# ... with 16 more rows
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With