Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Overlay normal curve to histogram in R

I have managed to find online how to overlay a normal curve to a histogram in R, but I would like to retain the normal "frequency" y-axis of a histogram. See two code segments below, and notice how in the second, the y-axis is replaced with "density". How can I keep that y-axis as "frequency", as it is in the first plot.

AS A BONUS: I'd like to mark the SD regions (up to 3 SD) on the density curve as well. How can I do this? I tried abline, but the line extends to the top of the graph and looks ugly.

g = d$mydata hist(g) 

enter image description here

g = d$mydata m<-mean(g) std<-sqrt(var(g)) hist(g, density=20, breaks=20, prob=TRUE,       xlab="x-variable", ylim=c(0, 2),       main="normal curve over histogram") curve(dnorm(x, mean=m, sd=std),        col="darkblue", lwd=2, add=TRUE, yaxt="n") 

enter image description here

See how in the image above, the y-axis is "density". I'd like to get that to be "frequency".

like image 334
StanLe Avatar asked Nov 19 '13 17:11

StanLe


People also ask

How do you overlay a normal curve on a histogram in R?

If you want to overlay a normal curve over your histogram you will need to calculate it with the dnorm function based on a grid of values and the mean and standard deviation of the data. Then you can add it with lines .


2 Answers

Here's a nice easy way I found:

h <- hist(g, breaks = 10, density = 10,           col = "lightgray", xlab = "Accuracy", main = "Overall")  xfit <- seq(min(g), max(g), length = 40)  yfit <- dnorm(xfit, mean = mean(g), sd = sd(g))  yfit <- yfit * diff(h$mids[1:2]) * length(g)   lines(xfit, yfit, col = "black", lwd = 2) 
like image 100
StanLe Avatar answered Sep 20 '22 08:09

StanLe


You need to find the right multiplier to convert density (an estimated curve where the area beneath the curve is 1) to counts. This can be easily calculated from the hist object.

myhist <- hist(mtcars$mpg) multiplier <- myhist$counts / myhist$density mydensity <- density(mtcars$mpg) mydensity$y <- mydensity$y * multiplier[1]  plot(myhist) lines(mydensity) 

enter image description here

A more complete version, with a normal density and lines at each standard deviation away from the mean (including the mean):

myhist <- hist(mtcars$mpg) multiplier <- myhist$counts / myhist$density mydensity <- density(mtcars$mpg) mydensity$y <- mydensity$y * multiplier[1]  plot(myhist) lines(mydensity)  myx <- seq(min(mtcars$mpg), max(mtcars$mpg), length.out= 100) mymean <- mean(mtcars$mpg) mysd <- sd(mtcars$mpg)  normal <- dnorm(x = myx, mean = mymean, sd = mysd) lines(myx, normal * multiplier[1], col = "blue", lwd = 2)  sd_x <- seq(mymean - 3 * mysd, mymean + 3 * mysd, by = mysd) sd_y <- dnorm(x = sd_x, mean = mymean, sd = mysd) * multiplier[1]  segments(x0 = sd_x, y0= 0, x1 = sd_x, y1 = sd_y, col = "firebrick4", lwd = 2) 
like image 41
Gregor Thomas Avatar answered Sep 21 '22 08:09

Gregor Thomas