I have a vector, in R, with 1521298 points, which have to be tested for normality. I chose the Shapiro-Wilk test, but the R function <code>shapiro.test()</code> says: <blockquote> Error in shapiro.test(z_scores) : sample size must be between 3 and 5000 </blockquote> Do you know any other function to test it or how to circumvent this issue?

You can try, Anderson-Darling normality test, which works for larger sample sizes. <pre class="prettyprint"><code>library(nortest) ad.test(data$variable) </code></pre>

Error in shapiro.test : sample size must be between

Q: Is Shapiro-Wilk test good for large sample size?

The Shapiro-Wilk Test is more appropriate for small sample sizes (< 50 samples), but can also handle sample sizes as large as 2000. The normality tests are sensitive to sample sizes.

Q: When p-value is less than 0.05 in Shapiro-Wilk test?

The null hypothesis for a Shapiro Wilk test is that there is no difference between your distribution and a normal distribution. The alternative hypothesis is that there is a difference. If your p value is less than 0.05, which it is, then you reject the null hypothesis and conclude that your data is nonormal.

Q: What should be the value of Shapiro-Wilk test?

Shapiro-Wilks Normality Test. The Shapiro-Wilks test for normality is one of three general normality tests designed to detect all departures from normality. It is comparable in power to the other two tests. The test rejects the hypothesis of normality when the p-value is less than or equal to 0.05.

2 Answers

Shapiro test cannot done using more than 5.000 records.

You can try to do the shapiro test using only the first 5.000 samples. IF it can help you, use the code like this:

shapiro.test(beaver2$temp[0:5000])

But pay attention, the test will use only the first 5.000 samples of your data.

In the other hand, if you need to use all the records of your sample, use another similar test, like Anderson-Darling normality test. You also can execute both and compare, like this script below:

# clean workspace
rm(list=ls())

# Install required packages:
install.packages('nortest')

#Model data tho use
ModelData = beaver2$temp

#Do shapiro test with only the first 5000 records
shapiro.test(ModelData[0:5000])$p.value

#Anderson-Darling normality test
library(nortest)
ad.test(ModelData)$p.value

184

answered Sep 25 '22 19:09

Wagner Cipriano

You can try, Anderson-Darling normality test, which works for larger sample sizes.

library(nortest)
ad.test(data$variable)

answered Sep 26 '22 19:09

VishnuVardhanA

Related questions
                            
                                change code block color in knitr/markdown
                            
                                Feeding newdata to R predict function
                            
                                Disregarding simple warnings/errors in tryCatch()
                            
                                using scientific notation in R
                            
                                R, filter matrix based on variance cut-offs
                            
                                sprintf format strings: reference by name?
                            
                                R finding rows of a data frame where certain columns match those of another [duplicate]
                            
                                Argument of set.seed in R
                            
                                Numerical derivatives of an arbitrarily defined function
                            
                                "NAs introduced by coercion" during Cluster Analysis in R
                            
                                ggplot2 version 0.9.3.1 won't load into R 3.0.2
                            
                                How to use parameters with RPostgreSQL (to insert data)
                            
                                R barplot: wrapping long text labels?
                            
                                Speeding up the processing of large data frames in R
                            
                                How to add a non-overlapping legend to associate colors with categories in pairs()?
                            
                                How to get multiple ggplot2 scale_fill_gradientn with same scale?
                            
                                how to interpret cca vegan output
                            
                                PCA multiplot in R
                            
                                aggregate() puts multiple output columns in a matrix instead
                            
                                Why does the tiff output look different than the ggplot2 output in R studio?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Error in shapiro.test : sample size must be between

Tags:

r

normal-distribution

Peter Pfand

People also ask

2 Answers

Wagner Cipriano

VishnuVardhanA

Recent Activity

Donate For Us