I have daily data with multiple categorical values, stored as a data frame:
YYYYMM Date ID Count
201401 01/01/2014 A 151
201401 01/01/2014 B 68
201401 01/01/2014 C 487
201401 02/01/2014 A 198
201401 02/01/2014 B 97
201401 02/01/2014 C 403
I am trying to plot a moving average against the actual values, using ggplot.
What I would like to do is create a 5th column in my data frame which has the mean in it.
I have tried this solution (found here: Constructing moving average over a categorical variable in R)
df$Mean<-0
library(plyr)
library(zoo)
ddply(
df, "ID",
transform,
df$Mean<-rollmean(Count, 7, na.pad=TRUE)
)
and it works, but it calculates the mean for every column in my data frame, and makes another data frame within the existing one, so I end up with something like this:
YYYYMM Date ID Count Mean.YYYYMM Mean.Date Mean.ID Mean.Count
201401 01/01/2014 A 151 201401 01/01/2014 B 58.90
201401 01/01/2014 B 68 201401 01/01/2014 B 62.05
201401 01/01/2014 C 487 201401 01/01/2014 B 61.84
201401 02/01/2014 A 198 201401 01/01/2014 B 58.02
201401 02/01/2014 B 97 201401 01/01/2014 B 57.65
201401 02/01/2014 C 403 201401 01/01/2014 B 59.65
When I try and plot this
for (var in unique(df$ID))
{
ggplot(df[df$ID==var,], aes(x=Date)) +
geom_line(aes(y=Count),color="blue") +
geom_line(aes(y=Mean$Count),color="grey",linetype="dashed") +
facet_wrap(~ID) +
theme_bw()
}
I get an error message. I'm wondering what I'm missing here, or if there is another way to go about this?
The 7-day moving average for any given day is calculated by taking the average sales of that day and the 6 previous days. The 14-day moving average is calculated by taking the average sales of the day in question and the previous 13 days. In R we can calculate this with a function called rollmean from the zoo package.
A 7-day moving average (MA) is a short term trend indicator. It is quite simply the average of closing prices of the last seven trading days.
SMA or simple moving average is an arithmetic moving average calculated by adding the recent prices and then dividing that value by the number of time periods in the calculation average.
You didn't supply enough data to create a weekly rolling mean within the groups, but in principle it could work like this:
library(tidyverse)
library(zoo)
my_data <- my_data %>%
group_by(ID) %>%
mutate(roll_mean = rollmean(Count, 2, na.pad = T))
Using dplyr
you group_by
your ID variable, and then create a single new column with the rolling mean. You can plot this then with standard ggplot2
-syntax:
ggplot(my_data, aes(Date, Count, group = 1)) +
geom_line(colour = "blue") +
geom_point(colour = "blue") +
geom_point(aes(y = roll_mean), colour = "red") +
facet_wrap(~ID)
#> Warning: Removed 3 rows containing missing values (geom_point).
zzz <- "YYYYMM Date ID Count
201401 01/01/2014 A 151
201401 01/01/2014 B 68
201401 01/01/2014 C 487
201401 02/01/2014 A 198
201401 02/01/2014 B 97
201401 02/01/2014 C 403"
my_data <- read_table(zzz)
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With