I have managed to find online how to overlay a normal curve to a histogram in R, but I would like to retain the normal "frequency" y-axis of a histogram. See two code segments below, and notice how in the second, the y-axis is replaced with "density". How can I keep that y-axis as "frequency", as it is in the first plot.
AS A BONUS: I'd like to mark the SD regions (up to 3 SD) on the density curve as well. How can I do this? I tried abline
, but the line extends to the top of the graph and looks ugly.
g = d$mydata hist(g)
g = d$mydata m<-mean(g) std<-sqrt(var(g)) hist(g, density=20, breaks=20, prob=TRUE, xlab="x-variable", ylim=c(0, 2), main="normal curve over histogram") curve(dnorm(x, mean=m, sd=std), col="darkblue", lwd=2, add=TRUE, yaxt="n")
See how in the image above, the y-axis is "density". I'd like to get that to be "frequency".
If you want to overlay a normal curve over your histogram you will need to calculate it with the dnorm function based on a grid of values and the mean and standard deviation of the data. Then you can add it with lines .
Here's a nice easy way I found:
h <- hist(g, breaks = 10, density = 10, col = "lightgray", xlab = "Accuracy", main = "Overall") xfit <- seq(min(g), max(g), length = 40) yfit <- dnorm(xfit, mean = mean(g), sd = sd(g)) yfit <- yfit * diff(h$mids[1:2]) * length(g) lines(xfit, yfit, col = "black", lwd = 2)
You need to find the right multiplier to convert density (an estimated curve where the area beneath the curve is 1) to counts. This can be easily calculated from the hist
object.
myhist <- hist(mtcars$mpg) multiplier <- myhist$counts / myhist$density mydensity <- density(mtcars$mpg) mydensity$y <- mydensity$y * multiplier[1] plot(myhist) lines(mydensity)
A more complete version, with a normal density and lines at each standard deviation away from the mean (including the mean):
myhist <- hist(mtcars$mpg) multiplier <- myhist$counts / myhist$density mydensity <- density(mtcars$mpg) mydensity$y <- mydensity$y * multiplier[1] plot(myhist) lines(mydensity) myx <- seq(min(mtcars$mpg), max(mtcars$mpg), length.out= 100) mymean <- mean(mtcars$mpg) mysd <- sd(mtcars$mpg) normal <- dnorm(x = myx, mean = mymean, sd = mysd) lines(myx, normal * multiplier[1], col = "blue", lwd = 2) sd_x <- seq(mymean - 3 * mysd, mymean + 3 * mysd, by = mysd) sd_y <- dnorm(x = sd_x, mean = mymean, sd = mysd) * multiplier[1] segments(x0 = sd_x, y0= 0, x1 = sd_x, y1 = sd_y, col = "firebrick4", lwd = 2)
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With