Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

R Script to average value over every <x> days

Tags:

r

I'm having an issue finding out how to calculate an average over "x" days. If I try to plot this csv file over 1 year, it's too much data to display correctly on a plot line (screenshot attached). I'm looking to average the data over every few days (maybe 2, a week, etc..) so the line graph is not so hard to read. Any advice on how I would solve this issue with R?

results.csv

POSTS,PROVIDER,TYPE,DATE
29337,FTP,BLOG,2010-01-01
26725,FTP,BLOG,2010-01-02
27480,FTP,BLOG,2010-01-03
31187,FTP,BLOG,2010-01-04
31488,FTP,BLOG,2010-01-05
32461,FTP,BLOG,2010-01-06
33675,FTP,BLOG,2010-01-07
38897,FTP,BLOG,2010-01-08
37122,FTP,BLOG,2010-01-09
41365,FTP,BLOG,2010-01-10
51760,FTP,BLOG,2010-01-11
50859,FTP,BLOG,2010-01-12
53765,FTP,BLOG,2010-01-13
56836,FTP,BLOG,2010-01-14
59698,FTP,BLOG,2010-01-15
52095,FTP,BLOG,2010-01-16
57154,FTP,BLOG,2010-01-17
80755,FTP,BLOG,2010-01-18
227464,FTP,BLOG,2010-01-19
394510,FTP,BLOG,2010-01-20
371303,FTP,BLOG,2010-01-21
370450,FTP,BLOG,2010-01-22
268703,FTP,BLOG,2010-01-23
267252,FTP,BLOG,2010-01-24
375712,FTP,BLOG,2010-01-25
381041,FTP,BLOG,2010-01-26
380948,FTP,BLOG,2010-01-27
373140,FTP,BLOG,2010-01-28
361874,FTP,BLOG,2010-01-29
265178,FTP,BLOG,2010-01-30
269929,FTP,BLOG,2010-01-31

R Script

library(ggplot2);
data <- read.csv("results.csv", header=T);
dts <- as.POSIXct(data$DATE, format="%Y-%m-%d");
attach(data);
a <- ggplot(dataframe, aes(dts,POSTS/1000, fill = TYPE)) + opts(title = "Report") + labs(x = NULL, y = "Posts (k)", fill = NULL);
b <- a + geom_bar(stat = "identity", position = "stack");
plot_theme <- theme_update(axis.text.x = theme_text(angle=90, hjust=1), panel.grid.major = theme_line(colour = "grey90"), panel.grid.minor = theme_blank(), panel.background = theme_blank(), axis.ticks = theme_blank(), legend.position = "none");
c <- b + facet_grid(TYPE ~ ., scale = "free_y");
d <- c + scale_x_datetime(major = "1 months", format = "%Y %b");
ggsave(filename="/root/results.png",height=14,width=14,dpi=600);

Graph Image

enter image description here

like image 348
Jeremy Carroll Avatar asked Feb 24 '11 14:02

Jeremy Carroll


1 Answers

Try this :

Average <- function(Data,n){
    # Make an index to be used for aggregating
    ID <- as.numeric(as.factor(Data$DATE))-1
    ID <- ID %/% n
    # aggregate over ID and TYPE for all numeric data.
    out <- aggregate(Data[sapply(Data,is.numeric)],
      by=list(ID,Data$TYPE),
      FUN=mean)
    # format output
    names(out)[1:2] <-c("dts","TYPE")
    # add the correct dates as the beginning of every period
    out$dts <- as.POSIXct(Data$DATE[(out$dts*n)+1])
    out
}

dataframe <- Average(Data,3)

This works with the plot script you gave.

Some remarks :

  • never ever call some variable after a function name (data, c, ...)
  • avoid the use of attach(). If you do, add detach() afterwards, or you'll get into trouble at some point. Better is to use the functions with() and within()
like image 93
Joris Meys Avatar answered Sep 30 '22 20:09

Joris Meys