<p>Can someone please help me fill in the following function in R:</p> <pre class="prettyprint"><code>#data is a single vector of decimal values normally.distributed <- function(data) { if(data is normal) return(TRUE) else return(NO) } </code></pre>

<p>I would also highly recommend the <code>SnowsPenultimateNormalityTest</code> in the <code>TeachingDemos</code> package. The documentation of the function is far more useful to you than the test itself, though. Read it thoroughly before using the test.</p>

Seeing if data is normally distributed in R

Tags:

r

normal-distribution

Can someone please help me fill in the following function in R:

#data is a single vector of decimal values normally.distributed <- function(data) { if(data is normal) return(TRUE) else return(NO) }

507

asked Oct 16 '11 01:10

CodeGuy

2 Answers

Normality tests don't do what most think they do. Shapiro's test, Anderson Darling, and others are null hypothesis tests AGAINST the the assumption of normality. These should not be used to determine whether to use normal theory statistical procedures. In fact they are of virtually no value to the data analyst. Under what conditions are we interested in rejecting the null hypothesis that the data are normally distributed? I have never come across a situation where a normal test is the right thing to do. When the sample size is small, even big departures from normality are not detected, and when your sample size is large, even the smallest deviation from normality will lead to a rejected null.

For example:

> set.seed(100) > x <- rbinom(15,5,.6) > shapiro.test(x)      Shapiro-Wilk normality test  data:  x  W = 0.8816, p-value = 0.0502  > x <- rlnorm(20,0,.4) > shapiro.test(x)      Shapiro-Wilk normality test  data:  x  W = 0.9405, p-value = 0.2453

So, in both these cases (binomial and lognormal variates) the p-value is > 0.05 causing a failure to reject the null (that the data are normal). Does this mean we are to conclude that the data are normal? (hint: the answer is no). Failure to reject is not the same thing as accepting. This is hypothesis testing 101.

But what about larger sample sizes? Let's take the case where there the distribution is very nearly normal.

> library(nortest) > x <- rt(500000,200) > ad.test(x)      Anderson-Darling normality test  data:  x  A = 1.1003, p-value = 0.006975  > qqnorm(x)

enter image description here

Here we are using a t-distribution with 200 degrees of freedom. The qq-plot shows the distribution is closer to normal than any distribution you are likely to see in the real world, but the test rejects normality with a very high degree of confidence.

Does the significant test against normality mean that we should not use normal theory statistics in this case? (another hint: the answer is no :) )

142

answered Oct 13 '22 19:10

Ian Fellows

I would also highly recommend the SnowsPenultimateNormalityTest in the TeachingDemos package. The documentation of the function is far more useful to you than the test itself, though. Read it thoroughly before using the test.

answered Oct 13 '22 20:10

Brian Diggs

Related questions
                            
                                class in R: S3 vs S4
                            
                                How to deal with hdf5 files in R?
                            
                                dplyr: how to reference columns by column index rather than column name using mutate?
                            
                                How to pass command-line arguments when calling source() on an R file within another R file
                            
                                Developing Geographic Thematic Maps with R
                            
                                Get the list of installed packages by user in R
                            
                                Why is it not advisable to use attach() in R, and what should I use instead?
                            
                                standard evaluation in dplyr: summarise a variable given as a character string
                            
                                Find the n most common values in a vector [duplicate]
                            
                                qqnorm and qqline in ggplot2
                            
                                Select unique values with 'select' function in 'dplyr' library
                            
                                Extract the first 2 Characters in a string
                            
                                Extract Month and Year From Date in R
                            
                                How to suppress warnings when plotting with ggplot
                            
                                Transparent equivalent of given color
                            
                                Aggregating by unique identifier and concatenating related values into a string [duplicate]
                            
                                Replace all values in a matrix <0.1 with 0
                            
                                Scale a series between two points
                            
                                How to add new column to an dataframe (to the front not end)?
                            
                                How to sort a character vector where elements contain letters and numbers in R?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With