I am very new to Statistics and R. Maybe this is a very trivial question, but I don't really understand how this works.
Suppose I use dnorm(5, 0, 2.5)
. What does that mean?
I saw some resources where they told that this function computes the height of the point in the density curve.
Now again I read that the exact probability of a number is 0 in continuous distribution. So, my question is if I can find out the height or probability of a certain value, then how come it is 0?
I know I have mixed up some concepts. But I'm unable to find where I'm wrong. It will be great if you spare your time to make me understand this. Thanks in advance.
dnorm is the R function that calculates the p. d. f. f of the normal distribution. As with pnorm and qnorm , optional arguments specify the mean and standard deviation of the distribution.
The qnorm function provides the quantile of the normal distribution at a specified cumulative density. An additional function, rnorm , draws random values from the normal distribution (but this is discussed in detail in the random sampling tutorial).
dnorm gives the density, pnorm gives the distribution function, qnorm gives the quantile function, and rnorm generates random deviates. The length of the result is determined by n for rnorm , and is the maximum of the lengths of the numerical arguments for the other functions.
qnorm function This function returns the value of the inverse cumulative density function (cdf) of the normal distribution given a certain random variable p, a population mean μ, and the population standard deviation σ.
The density returns a number that in itself does not translate directly into a probability. But it gives the height of a curve that, if drawn over the full range of possible numbers, has the area underneath it that adds up to 1.
Consider this. If I make the vector x
of evenly spaced numbers from -7.5 to 7.5, 0.1 apart, and get the density of a normal variable with mean 0 and standard deviation 2.5 for each value of x
.
x <- seq(from = -7.5, to = 7.55, by = 0.1)
y <- dnorm(x, 0, 2.5)
The approximate value of the area under the curve formed by those densities (which I have stored as y
), multiplied by their distance apart (0.1) is nearly 1:
> sum(y * 0.1)
[1] 0.9974739
If you did this properly with calculus rather than approximating it with numbers, it would be exactly one.
Why is this useful? The cumulative area under parts of the curve can be used to estimate the probability of the variable coming anywhere in a particular range, even though as one of your sources points out, the chance of any precise number is technically zero for a continuous variable.
Consider this graphic. The area of the shaded space shows the chance of a variable from your normal distribution (mean zero, standard deviation 2.5) being between -7.5 and 4. This leads to many useful applications.
Made with:
library(ggplot2)
d <- data.frame(x, y)
ggplot(d, aes(x = x, y = y)) +
geom_line() +
geom_point() +
geom_ribbon(fill = "steelblue", aes(ymax = y), ymin = 0, alpha = 0.5, data = subset(d, x <= 4)) +
annotate("text", x= -4, y = 0.13, label = "Each point is an individual density\nestimate of dnorm(x, 0, 2.5)") +
annotate("text", x = -.3, y = 0.02, label = "Filled area under the curve shows the cumulative probability\nof getting a number as high as a given x, in this case 4") +
ggtitle("Density of a random normal variable with mean zero and standard deviation 2.5")
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With