I'm trying to figure out how to do something with ggplot2 and R that seems like it should be really simple. It's so simple... that I cannot for the life of me figure out how to do it. I'm sure the answer is staring me in the face in the ggplot documentation, but I can't... find it. So. I'm here.
I frequently have datasets a lot like this:
tdf <- data.frame('datetime' = seq(from=as.POSIXct('2012-01-01 00:00:00'),
to=as.POSIXct('2012-01-31 23:59:59'), by=1))
tdf$variable <- rep(c('a','b','c'), length.out=length(tdf$datetime))
tdf$value <- sample(1:10, length(tdf$datetime), replace=T)
> head(tdf)
datetime variable value
1 2012-01-01 00:00:00 a 7
2 2012-01-01 00:00:01 b 3
3 2012-01-01 00:00:02 c 7
4 2012-01-01 00:00:03 a 8
5 2012-01-01 00:00:04 b 2
6 2012-01-01 00:00:05 c 3
That is: I have a categorical variable (a factor), a value for that variable, and a timestamp at which said observation was recorded. I want to plot the sum of the value, for each categorical variable, for a given time "bucket" -- preferably using ggplot2. I would like to do it without having to pre-aggregate it before I visualize it -- that is, I really want the flexibility of leaving the dataset as it is and passing the arguments to ggplot2 to aggregate it at on time. And yet, I'm completely flummoxed. The documentation on geom_line says to use stat='identity' to get sum of value, but once I've done that I can no longer define any kind of bin. If I use stat_summary, I frequently don't get a plot back at all. The closest I've gotten is:
tdf$variable <- factor(tdf$variable)
vis <- ggplot(tdf, aes(x=datetime, y=value, color=variable))
vis <- vis + geom_line(stat='identity')
vis <- vis + scale_x_datetime()
...which at least prints a plot, with a line corresponding to the values of each factor... by second. I cannot get it to bin the sum(value) operation for, say, an hour or a day or a week without doing a bunch of work to pre-aggregate the data.
Help?
Edit: Apologies to anyone whose R session choked on this test data. I've cut it back.
Alright, I think this is what you want. I've cut down your dataset dramatically, the posted one is waaaay to big for a testing this stuff out.
tdf <- data.frame('datetime' = seq(from=as.POSIXct('2012-01-01 00:00:00'), to=as.POSIXct('2012-01-01 00:10:59'), by=1))
tdf$variable <- rep(c('a','b','c'), length.out=length(tdf$datetime))
tdf$value <- sample(1:10, length(tdf$datetime), replace=T)
tdf$variable <- factor(tdf$variable)
vis2 <- ggplot(tdf, aes(datetime, color=variable)) +
geom_bar(binwidth=5,aes(weight=value),position="dodge") +
scale_x_datetime(limits=c(min(tdf$datetime), max(tdf$datetime)))
geom_bar uses stat_bin so you can change your bins. By default it gets teh counts, but if you want the sum, you can add the weight argument in aes(). Let me know if this is not answering your question.
BTW, with the way this specific data is setup, is would probably make sense to separate your variables, using something like facet, ie:
vis2 <- ggplot(tdf, aes(datetime, fill=variable)) +
geom_bar(binwidth=100,aes(weight=value),position="dodge") +
scale_x_datetime(limits=c(min(tdf$datetime), max(tdf$datetime))) +
facet_wrap(~variable)
Otherwise it might look like the variable are across different time bins.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With