Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Is it possible to 'bin' values by date to get a total per 2 weeks in ggplot2 and R?

Tags:

r

ggplot2

I have a dataframe which is a history of runs. Some fo the variables include a date (in POSIXct) and a value for that run (here = size). I want to produce various graphs showing a line based on the total fo the size column for a particular date range. Ideally I'd like to use the same dataset and change from totals per week, 2 weeks, month quarter.

Here's an example dataset;

require(ggplot2)
set.seed(666)

seq(Sys.time()-(365*24*60*60), Sys.time(), by="day")

foo<-data.frame(Date=sample(seq(today-(365*24*60*60), today, by="day"),50, replace=FALSE),
        value=rnorm(50, mean=100, sd=25),
        type=sample(c("Red", "Blue", "Green"), 50, replace=TRUE))

I can create this plot which shows individual values;

ggplot(data=foo, aes(x=Date, y=value, colour=type))+stat_summary(fun.y=sum, geom="line")

Or I can do this to show a sum per Month;

ggplot(data=foo, aes(x=format(Date, "%m %y"), y=value, colour=type))+stat_summary(fun.y=sum, geom="line", aes(group=type))

However it gets more complicated to do sums per quarter / 2 weeks etc. Ideally I'd like something like the stat_bin and stat_summary combined so I could specify a binwidth (or have ggplot make a best guess based on the range)

Am I missing something obvious, or is this just not possible ?

like image 991
PaulHurleyuk Avatar asked Jan 21 '11 19:01

PaulHurleyuk


Video Answer


2 Answers

It's pretty easy with plyr and lubridate to do all the calculations yourself:

library(plyr)
library(lubridate)

foo <- data.frame(
  date = sample(today() + days(1:365), 50, replace = FALSE),
  value = rnorm(50, mean = 100, sd = 25),
  type = sample(c("Red", "Blue", "Green"), 50, replace = TRUE))

foo$date2 <- floor_date(foo$date2, "week")
foosum <- ddply(foo, c("date2", "type"), summarise, 
  n = length(value),
  mean = mean(value))

ggplot(foosum, aes(date2, mean, colour = type)) +
  geom_point(aes(size = n)) + 
  geom_line()
like image 182
hadley Avatar answered Oct 02 '22 16:10

hadley


The chron package could be very useful to convert dates in a way not covered in the "basic" format command. But the latter can also do smart things (like the strftime in PHP), e.g.:

Show given year and month of a date:

format(foo$Date, "%Y-%m")

And with package chron showing the appropriate quarter of year:

quarters(foo$Date)

To compute the 2-weeks period, you might not find a complete function, but could be computed from a the week number easily, e.g.:

floor(as.numeric(format(foo$Date, "%V"))/2)+1

After computing the new variables in the dataframe, you could easily plot your data just like in your original example.

like image 45
daroczig Avatar answered Oct 02 '22 15:10

daroczig