I have a dataframe which is a history of runs. Some fo the variables include a date (in POSIXct) and a value for that run (here = size). I want to produce various graphs showing a line based on the total fo the size column for a particular date range. Ideally I'd like to use the same dataset and change from totals per week, 2 weeks, month quarter.
Here's an example dataset;
require(ggplot2)
set.seed(666)
seq(Sys.time()-(365*24*60*60), Sys.time(), by="day")
foo<-data.frame(Date=sample(seq(today-(365*24*60*60), today, by="day"),50, replace=FALSE),
value=rnorm(50, mean=100, sd=25),
type=sample(c("Red", "Blue", "Green"), 50, replace=TRUE))
I can create this plot which shows individual values;
ggplot(data=foo, aes(x=Date, y=value, colour=type))+stat_summary(fun.y=sum, geom="line")
Or I can do this to show a sum per Month;
ggplot(data=foo, aes(x=format(Date, "%m %y"), y=value, colour=type))+stat_summary(fun.y=sum, geom="line", aes(group=type))
However it gets more complicated to do sums per quarter / 2 weeks etc. Ideally I'd like something like the stat_bin and stat_summary combined so I could specify a binwidth (or have ggplot make a best guess based on the range)
Am I missing something obvious, or is this just not possible ?
It's pretty easy with plyr and lubridate to do all the calculations yourself:
library(plyr)
library(lubridate)
foo <- data.frame(
date = sample(today() + days(1:365), 50, replace = FALSE),
value = rnorm(50, mean = 100, sd = 25),
type = sample(c("Red", "Blue", "Green"), 50, replace = TRUE))
foo$date2 <- floor_date(foo$date2, "week")
foosum <- ddply(foo, c("date2", "type"), summarise,
n = length(value),
mean = mean(value))
ggplot(foosum, aes(date2, mean, colour = type)) +
geom_point(aes(size = n)) +
geom_line()
The chron package could be very useful to convert dates in a way not covered in the "basic" format
command. But the latter can also do smart things (like the strftime in PHP), e.g.:
Show given year and month of a date:
format(foo$Date, "%Y-%m")
And with package chron showing the appropriate quarter of year:
quarters(foo$Date)
To compute the 2-weeks period, you might not find a complete function, but could be computed from a the week number easily, e.g.:
floor(as.numeric(format(foo$Date, "%V"))/2)+1
After computing the new variables in the dataframe, you could easily plot your data just like in your original example.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With