Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to add gaussian curve to histogram created with qplot?

I have question probably similar to Fitting a density curve to a histogram in R. Using qplot I have created 7 histograms with this command:

 (qplot(V1, data=data, binwidth=10, facets=V2~.)   

For each slice, I would like to add a fitting gaussian curve. When I try to use lines() method, I get error:

Error in plot.xy(xy.coords(x, y), type = type, ...) : 
plot.new has not been called yet

What is the command to do it correctly?

like image 248
mkk Avatar asked Aug 24 '11 21:08

mkk


People also ask

How do you add a normal curve to a histogram in R?

A basic histogram can be created with the hist function. In order to add a normal curve or the density line you will need to create a density histogram setting prob = TRUE as argument.

How do you add a normal curve to a histogram in Excel?

The closer the normal curve is to your histogram, the more likely that the data are normally distributed. To use this approach for the data in column B of Figure 1, press Ctrl-m and select the Histogram and Normal Curve Overlay option. Fill in the dialog box that appears as shown in Figure 6.

How do I create a normal curve in Ggplot with specific means and SD?

In order to create a normal curve, we create a ggplot base layer that has an x-axis range from -4 to 4 (or whatever range you want!), and assign the x-value aesthetic to this range ( aes(x = x) ). We then add the stat_function option and add dnorm to the function argument to make it a normal curve.

What is a Gaussian histogram?

The Gaussian form ( ) plots a best fit Gaussian to the histogram of a sample of data. In fact, all it does is to calculate the mean and standard deviation of the sample, and plot the corresponding Gaussian curve. The mean and standard deviation values are reported by the plot (see below).


2 Answers

Have you tried stat_function?

+ stat_function(fun = dnorm)

You'll probably want to plot the histograms using aes(y = ..density..) in order to plot the density values rather than the counts.

A lot of useful information can be found in this question, including some advice on plotting different normal curves on different facets.

Here are some examples:

dat <- data.frame(x = c(rnorm(100),rnorm(100,2,0.5)), 
                  a = rep(letters[1:2],each = 100))

Overlay a single normal density on each facet:

ggplot(data = dat,aes(x = x)) + 
  facet_wrap(~a) + 
    geom_histogram(aes(y = ..density..)) + 
    stat_function(fun = dnorm, colour = "red")

enter image description here

From the question I linked to, create a separate data frame with the different normal curves:

grid <- with(dat, seq(min(x), max(x), length = 100))
normaldens <- ddply(dat, "a", function(df) {
  data.frame( 
    predicted = grid,
    density = dnorm(grid, mean(df$x), sd(df$x))
  )
})

And plot them separately using geom_line:

ggplot(data = dat,aes(x = x)) + 
    facet_wrap(~a) + 
    geom_histogram(aes(y = ..density..)) + 
    geom_line(data = normaldens, aes(x = predicted, y = density), colour = "red")

enter image description here

like image 126
joran Avatar answered Oct 13 '22 12:10

joran


ggplot2 uses a different graphics paradigm than base graphics. (Although you can use grid graphics with it, the best way is to add a new stat_function layer to the plot. The ggplot2 code is the following.

Note that I couldn't get this to work using qplot, but the transition to ggplot is reasonably straighforward, the most important difference is that your data must be in data.frame format.

Also note the explicit mapping of the y aesthetic aes=aes(y=..density..)) - this is slighly unusual but takes the stat_function results and maps it to the data:

library(ggplot2)
data <- data.frame(V1 <- rnorm(700), V2=sample(LETTERS[1:7], 700, replace=TRUE))
ggplot(data, aes(x=V1)) + 
  stat_bin(aes(y=..density..)) + 
  stat_function(fun=dnorm) + 
  facet_grid(V2~.)

enter image description here

like image 32
Andrie Avatar answered Oct 13 '22 11:10

Andrie