trying to compare two distributions

Tags:

I found this code on internet that compares a normal distribution to different student distributions:

x <- seq(-4, 4, length=100)
hx <- dnorm(x)

degf <- c(1, 3, 8, 30)
colors <- c("red", "blue", "darkgreen", "gold", "black")
labels <- c("df=1", "df=3", "df=8", "df=30", "normal")

plot(x, hx, type="l", lty=2, xlab="x value",
  ylab="Density", main="Comparison of t Distributions")

for (i in 1:4){
  lines(x, dt(x,degf[i]), lwd=2, col=colors[i])
}

I would like to adapt this to my situation where I would like to compare my data to a normal distribution. This is my data:

library(quantmod)
getSymbols("^NDX",src="yahoo", from='1997-6-01', to='2012-6-01')
daily<- allReturns(NDX) [,c('daily')]
dailySerieTemporel<-ts(data=daily)
ss<-na.omit(dailySerieTemporel)

The objectif being to see if my data is normal or not... Can someone help me out a bit with this ? Thank you very much I really appreciate it !

923

asked Aug 05 '12 22:08

jeremy.staub

2 Answers

If you are only concern about knowing if your data is normal distributed or not, you can apply the Jarque-Bera test. This test states that under the null your data is normal distributed, see details here. You can perform this test using jarque.bera.test function.

 library(tseries)
 jarque.bera.test(ss)

    Jarque Bera Test

data:  ss 
X-squared = 4100.781, df = 2, p-value < 2.2e-16

Clearly, from the result, you can see that your data is not normaly distributed since the null has been rejected even at 1%.

To see why your data is not normaly distributed you can take a look at the descriptive statistics:

 library(fBasics)
 basicStats(ss)
                     ss
nobs        3776.000000
NAs            0.000000
Minimum       -0.105195
Maximum        0.187713
1. Quartile   -0.009417
3. Quartile    0.010220
Mean           0.000462
Median         0.001224
Sum            1.745798
SE Mean        0.000336
LCL Mean      -0.000197
UCL Mean       0.001122
Variance       0.000427
Stdev          0.020671
Skewness       0.322820
Kurtosis       5.060026

From the last two rows, one can realize that ss has an excess of kurtosis, and the skewness is not zero. This is the basis of the Jarque-Bera test.

But if you are interested in compare actual distribution of your data agaist a normal distibuted random variable with the same mean and variance as your data, you can first estimate the empirical density function from your data using a kernel and then plot it, finally you only have to generate a normal random variable with same mean and variance as you data, do something like this:

 plot(density(ss, kernel='epanechnikov'))
 set.seed(125)
 lines(density(rnorm(length(ss), mean(ss), sd(ss)), kernel='epanechnikov'), col=2)

enter image description here

In this fashion you can generate other curve from another probability distribution.

The tests suggested by @Alex Reynolds will help you if your interest is to know what possible distribution your data were drawn from. If this is your goal you can take a look at any goodness-of-it test in any statistics texbook. Nevertheless, if just want to know if your variable is normally distributed then Jarque-Bera test is good enough.

answered Sep 22 '22 20:09

Jilber Urbina

Take a look at Q-Q, Shapiro-Wilk or K-S tests to see if your data are normally distributed.

answered Sep 25 '22 20:09

Alex Reynolds

Related questions
                            
                                Is it possible to truncate output when viewing the contents of dataframes?
                            
                                `With` usage inside function (wrapper)
                            
                                Column alignment in xtable output
                            
                                Bootstrap Confidence Intervals in R
                            
                                How do I count the number of observations at given intervals in R?
                            
                                How do I make an array of classes in R?
                            
                                ggplot geom_tile spacing with facets
                            
                                R cleaning up a character and converting it into a numeric
                            
                                Adding points to a geom_tile layer in ggplot2
                            
                                Ignoring values or NAs in the sample function
                            
                                Excel like column operations in R dataframe
                            
                                R: "apply" statement to take the sum of the number of non-NA values across multiple columns
                            
                                Read multiple files under different directories in R
                            
                                Make a list of equal length vectors
                            
                                create an arrow with gradient color
                            
                                Error in read.csv with colClasses: scan() expected 'a real' got 'NULL'
                            
                                Adding zeros in front of an vector
                            
                                R Plot Filled Longitude-Latitude Grid Cells on Map
                            
                                Merging data.tables uses more than 10 GB RAM
                            
                                Reshaping a data frame --- changing rows to columns

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

trying to compare two distributions

Tags:

r

statistics

finance

jeremy.staub

People also ask

2 Answers

Jilber Urbina

Alex Reynolds

Recent Activity

Donate For Us