Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

plot running average in ggplot2

Tags:

r

ggplot2

I'm hoping to create a plot that shows a running average over a scatterplot of the observed data. The data consists of observations of hares' coat color (Color) over time (Julian).

Color  Julian
50  85
50  87
50  89
50  90
100 91
50  91
50  92
50  92
100 92
50  93
100 93
50  93
50  95
100 95
50  95
50  96
50  96
50  99
50  100
0   101
0   101
0   103
50  103
50  104
50  104
50  104
50  104
100 104
100 104
50  109
50  109
100 109
0   110
0   110
50  110
50  110
50  110
50  110
0   112

A friend wrote a function for me that calculates a running average of the color observations, but I can't figure out how to add the line (haresAveNoNa) into the plot.

The function:

haresAverage <- matrix( NA, max(hares$Julian), 3 )
for( i in 4:max(hares$Julian) ){
  haresAverage[i,1]<-i
  haresAverage[i,2]<-mean( hares$Color[ hares$Julian >= (i-3) &
                                             hares$Julian <= (i+3)]
                              , na.rm=T )
  haresAverage[i,3]<-sd( hares$Color[ hares$Julian >= (i-3) &
                                           hares$Julian <= (i+3)]

                            , na.rm=T )
}
haresAveNoNa <- na.omit( haresAverage)

The plot:

p <- ggplot(hares, aes(Julian, Color))
p  +
  geom_jitter(width = 1, height = 5, color="blue", alpha = .65) 

Can you please help me add the running average 'haresAveNoNa' into the plot? Thanks very much!

like image 800
Kestrel1 Avatar asked Nov 29 '16 03:11

Kestrel1


People also ask

How do you plot a rolling average in R?

How to Compute Rolling Average in R? Let us try to make a plot with rolling average. First, let us use the R package zoo to compute rolling average over a week and plot on top of the barplot. With rollmean() function available in zoo package we can compute rolling average.

How do you create a moving average in R?

Calculating rolling averages To calculate a simple moving average (over 7 days), we can use the rollmean() function from the zoo package. This function takes a k , which is an 'integer width of the rolling window. The code below calculates a 3, 5, 7, 15, and 21-day rolling average for the deaths from COVID in the US.

How does R calculate SMA?

SMA or simple moving average is an arithmetic moving average calculated by adding the recent prices and then dividing that value by the number of time periods in the calculation average.


1 Answers

You can calculate the rolling mean using rollmean from the zoo package instead of writing your own function. You can invoke rollmean on the fly, within ggplot, to add the rolling mean line, or you can add the rolling mean values to your data frame and then plot them. I provide examples below for both methods. The code below calculates a centered rolling mean with a seven-day window, but you can customize the function for different window sizes and for a left- or right-aligned rolling mean, rather than centered.

Calculate rolling mean on the fly within ggplot

library(zoo)

ggplot(hares, aes(Julian, Color)) + 
  geom_point(position=position_jitter(1,3), pch=21, fill="#FF0000AA") +
  geom_line(aes(y=rollmean(Color, 7, na.pad=TRUE))) +
  theme_bw()

enter image description here

Add rolling mean to your data frame as a new column and then plot it

To answer your specific question, let's say you actually do need to add the rolling mean line from separate data, rather than calculate it on the fly. If the rolling mean is another column in your data frame, you just need to give the new column name to geom_line:

hares$roll7 = rollmean(hares$Color, 7, na.pad=TRUE)

ggplot(hares, aes(Julian, Color)) + 
  geom_point(position=position_jitter(1,3), pch=21, fill="#FF0000AA") +
  geom_line(aes(y=roll7)) +
  theme_bw()

Add rolling mean to a plot using a separate data frame

If the rolling mean is in a separate data frame, you need to feed that data frame to geom_line:

haresAverage = data.frame(Julian=hares$Julian, 
                          Color=rollmean(hares$Color, 7, na.pad=TRUE))

ggplot(hares, aes(Julian, Color)) + 
  geom_point(position=position_jitter(1,3), pch=21, fill="#FF0000AA") +
  geom_line(data=haresAverage, aes(Julian, Color)) +
  theme_bw()

UPDATE: To show date instead of the numeric Julian value

First, convert Julian to Date format. I don't know the actual mapping from Julian to date in your data, so for this example let's assume that Julian is the day of the year, counting the first day of the year as 1, and let's assume the year is 2015.

hares$Date = as.Date(hares$Julian + as.numeric(as.Date("2015-01-01")) - 1)

Now we plot using our new Date column for the x-axis. To customize both the number of breaks and the date labels, use scale_x_date.

ggplot(hares, aes(Date, Color)) + 
  geom_point(position=position_jitter(1,3), pch=21, fill="#FF0000AA") +
  geom_line(aes(y=rollmean(Color, 7, na.pad=TRUE))) +
  theme_bw() +
  scale_x_date(date_breaks="weeks", date_labels="%b %e")

enter image description here

like image 135
eipi10 Avatar answered Oct 10 '22 17:10

eipi10