I am very new to Statistics and R. Maybe this is a very trivial question, but I don't really understand how this works. Suppose I use <code>dnorm(5, 0, 2.5)</code>. What does that mean? I saw some resources where they told that this function computes the height of the point in the density curve. Now again I read that the exact probability of a number is 0 in continuous distribution. So, my question is if I can find out the height or probability of a certain value, then how come it is 0? I know I have mixed up some concepts. But I'm unable to find where I'm wrong. It will be great if you spare your time to make me understand this. Thanks in advance.

The density returns a number that in itself does not translate directly into a probability. But it gives the height of a curve that, if drawn over the full range of possible numbers, has the area underneath it that adds up to 1. Consider this. If I make the vector <code>x</code> of evenly spaced numbers from -7.5 to 7.5, 0.1 apart, and get the density of a normal variable with mean 0 and standard deviation 2.5 for each value of <code>x</code>. <pre class="prettyprint"><code>x <- seq(from = -7.5, to = 7.55, by = 0.1) y <- dnorm(x, 0, 2.5) </code></pre> The approximate value of the area under the curve formed by those densities (which I have stored as <code>y</code>), multiplied by their distance apart (0.1) is nearly 1: <pre class="prettyprint"><code>> sum(y * 0.1) [1] 0.9974739 </code></pre> If you did this properly with calculus rather than approximating it with numbers, it would be exactly one. Why is this useful? The cumulative area under parts of the curve can be used to estimate the probability of the variable coming anywhere in a particular range, even though as one of your sources points out, the chance of any precise number is technically zero for a continuous variable. Consider this graphic. The area of the shaded space shows the chance of a variable from your normal distribution (mean zero, standard deviation 2.5) being between -7.5 and 4. This leads to many useful applications. <img src="https://i.stack.imgur.com/F9xOB.png" alt="enter image description here"> Made with: <pre class="prettyprint"><code>library(ggplot2) d <- data.frame(x, y) ggplot(d, aes(x = x, y = y)) + geom_line() + geom_point() + geom_ribbon(fill = "steelblue", aes(ymax = y), ymin = 0, alpha = 0.5, data = subset(d, x <= 4)) + annotate("text", x= -4, y = 0.13, label = "Each point is an individual density\nestimate of dnorm(x, 0, 2.5)") + annotate("text", x = -.3, y = 0.02, label = "Filled area under the curve shows the cumulative probability\nof getting a number as high as a given x, in this case 4") + ggtitle("Density of a random normal variable with mean zero and standard deviation 2.5") </code></pre>

How dnorm works?

1 Answers

The density returns a number that in itself does not translate directly into a probability. But it gives the height of a curve that, if drawn over the full range of possible numbers, has the area underneath it that adds up to 1.

Consider this. If I make the vector x of evenly spaced numbers from -7.5 to 7.5, 0.1 apart, and get the density of a normal variable with mean 0 and standard deviation 2.5 for each value of x.

Click to copy

x <- seq(from = -7.5, to = 7.55, by = 0.1)
y <- dnorm(x, 0, 2.5)

The approximate value of the area under the curve formed by those densities (which I have stored as y), multiplied by their distance apart (0.1) is nearly 1:

Click to copy

> sum(y * 0.1)
[1] 0.9974739

If you did this properly with calculus rather than approximating it with numbers, it would be exactly one.

Why is this useful? The cumulative area under parts of the curve can be used to estimate the probability of the variable coming anywhere in a particular range, even though as one of your sources points out, the chance of any precise number is technically zero for a continuous variable.

Consider this graphic. The area of the shaded space shows the chance of a variable from your normal distribution (mean zero, standard deviation 2.5) being between -7.5 and 4. This leads to many useful applications.

enter image description here

Made with:

Click to copy

library(ggplot2)

d <- data.frame(x, y)

ggplot(d, aes(x = x, y = y)) +
  geom_line() +
  geom_point() +
  geom_ribbon(fill = "steelblue", aes(ymax = y), ymin = 0, alpha = 0.5, data = subset(d, x <= 4)) +
  annotate("text", x= -4, y = 0.13, label = "Each point is an individual density\nestimate of dnorm(x, 0, 2.5)") +
  annotate("text", x = -.3, y = 0.02, label = "Filled area under the curve shows the cumulative probability\nof getting a number as high as a given x, in this case 4") +
  ggtitle("Density of a random normal variable with mean zero and standard deviation 2.5")

160

answered Sep 18 '22 06:09

Peter Ellis

Related questions
                            
                                How to compute rowSums in rcpp
                            
                                named Element-wise operations in R
                            
                                Read table with comment lines starting with "##"
                            
                                get nearest data from dataframe in R [duplicate]
                            
                                Fill missing values in data.frame using dplyr complete within groups
                            
                                R - ggplot2 'dodge' geom_step() to overlap geom_bar()
                            
                                error with tidyr::gather() when I have unique names
                            
                                R: Apply function to matrix with elements of vector as argument
                            
                                Errors in makeCluster(multicore): cannot open the connection
                            
                                Adding column to sqlite database
                            
                                What R function to use for regex capture groups?
                            
                                calculating simple retention in R
                            
                                Build a file diagram for an R code
                            
                                Interpreting Alias table testing multicollinearity of model in R
                            
                                How can I force ggplot to show more levels on the legend?
                            
                                Lower trailing parts of letters "g" and "y" etc hidden/cut off/overwritten in ggplot labels
                            
                                Shiny leaflet easyPrint plugin
                            
                                Include TikZ code in bookdown figure environment
                            
                                knitr: add to previous plot in new code chunk
                            
                                Creating a vertical color gradient for a geom_bar plot

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

How dnorm works?

Tags:

r

statistics

probability-distribution

lu5er

People also ask

1 Answers

Peter Ellis

Recent Activity

Donate For Us