Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

using stat_function and facet_wrap together in ggplot2 in R

Tags:

graph

r

ggplot2

I am trying to plot lattice type data with ggplot2 and then superimpose a normal distribution over the sample data to illustrate how far off normal the underlying data is. I would like to have the normal dist on top to have the same mean and stdev as the panel.

here's an example:

library(ggplot2)  #make some example data dd<-data.frame(matrix(rnorm(144, mean=2, sd=2),72,2),c(rep("A",24),rep("B",24),rep("C",24))) colnames(dd) <- c("x_value", "Predicted_value",  "State_CD")  #This works pg <- ggplot(dd) + geom_density(aes(x=Predicted_value)) +  facet_wrap(~State_CD) print(pg) 

That all works great and produces a nice three panel graph of the data. How do I add the normal dist on top? It seems I would use stat_function, but this fails:

#this fails pg <- ggplot(dd) + geom_density(aes(x=Predicted_value)) + stat_function(fun=dnorm) +  facet_wrap(~State_CD) print(pg) 

It appears that the stat_function is not getting along with the facet_wrap feature. How do I get these two to play nicely?

------------EDIT---------

I tried to integrate ideas from two of the answers below and I am still not there:

using a combination of both answers I can hack together this:

library(ggplot) library(plyr)  #make some example data dd<-data.frame(matrix(rnorm(108, mean=2, sd=2),36,2),c(rep("A",24),rep("B",24),rep("C",24))) colnames(dd) <- c("x_value", "Predicted_value",  "State_CD")  DevMeanSt <- ddply(dd, c("State_CD"), function(df)mean(df$Predicted_value))  colnames(DevMeanSt) <- c("State_CD", "mean") DevSdSt <- ddply(dd, c("State_CD"), function(df)sd(df$Predicted_value) ) colnames(DevSdSt) <- c("State_CD", "sd") DevStatsSt <- merge(DevMeanSt, DevSdSt)  pg <- ggplot(dd, aes(x=Predicted_value)) pg <- pg + geom_density() pg <- pg + stat_function(fun=dnorm, colour='red', args=list(mean=DevStatsSt$mean, sd=DevStatsSt$sd)) pg <- pg + facet_wrap(~State_CD) print(pg) 

which is really close... except something is wrong with the normal dist plotting:

enter image description here

what am I doing wrong here?

like image 344
JD Long Avatar asked Sep 04 '09 02:09

JD Long


1 Answers

stat_function is designed to overlay the same function in every panel. (There's no obvious way to match up the parameters of the function with the different panels).

As Ian suggests, the best way is to generate the normal curves yourself, and plot them as a separate dataset (this is where you were going wrong before - merging just doesn't make sense for this example and if you look carefully you'll see that's why you're getting the strange sawtooth pattern).

Here's how I'd go about solving the problem:

dd <- data.frame(   predicted = rnorm(72, mean = 2, sd = 2),   state = rep(c("A", "B", "C"), each = 24) )   grid <- with(dd, seq(min(predicted), max(predicted), length = 100)) normaldens <- ddply(dd, "state", function(df) {   data.frame(      predicted = grid,     density = dnorm(grid, mean(df$predicted), sd(df$predicted))   ) })  ggplot(dd, aes(predicted))  +    geom_density() +    geom_line(aes(y = density), data = normaldens, colour = "red") +   facet_wrap(~ state)  

enter image description here

like image 145
hadley Avatar answered Sep 21 '22 18:09

hadley