Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Approximate the distribution of a sum of binomial random variables in R

My goal is approximate the distribution of a sum of binomial variables. I use the following paper The Distribution of a Sum of Binomial Random Variables by Ken Butler and Michael Stephens.

I want to write an R script to find Pearson approximation to the sum of binomials. There is an R-package PearsonDS that allows do this in a simple way.

So I take the first example from the paper and try to find density of the Pearson distribution for this case. Finally i get an error message "There are no probability distributions with these moments".

Could you please explain me what's wrong in the below code?

library(PearsonDS)

# define parameters for five binomial random varibles

n<-rep(5,5)
p<-seq(0.02,0.10,0.02)

# find the first four cumulants

k.1<-sum(n*p)
k.2<-sum(n*p*(1-p))
k.3<-sum(n*p*(1-p)*(1-2*p))
k.4<-sum(n*p*(1-p)*(1-6*p*(1-p)))

# find the skewness and kurtosis parameters

beta.1<-k.3^2/k.2^3
beta.2<-k.4/k.2^2

# define the moments and calculate

moments <- c(mean=k.1,variance=k.2,skewness=sqrt(beta.1),kurtosis=beta.2)
dpearson(1:7,moments=moments)

I get the error message "There are no probability distributions with these moments".

like image 744
Eugeny Chankov Avatar asked Nov 03 '22 00:11

Eugeny Chankov


1 Answers

What you try to insert as kurtosis in your moments, is actually the excess kurtosis, which is just kurtosis - 3. From the help-page of dpearson():

moments:
optional vector/list of mean, variance, skewness, kurtosis (not excess kurtosis).

So adding 3 to beta.2 will provide you with the real kurtosis:

beta.1 <- (k.3^2)/(k.2^3)
beta.2 <- k.4/(k.2^2)
kurt <- beta.2 + 3

moments <- c(mean = k.1, variance = k.2, skewness = beta.1, kurtosis = kurt)
dpearson(1:7, moments=moments)
# [1] 0.3438773545 0.2788412385 0.1295129534 0.0411140817 0.0099279576
# [6] 0.0019551512 0.0003294087

To get a result like the one in the paper, we should investigate the cumulative distribution function and add 0.5 to correct for the bias caused by approximating a discrete distribution by a continuous one:

ppearson(1:7+0.5, moments = moments)
# [1] 0.5348017 0.8104394 0.9430092 0.9865434 0.9973715 0.9995578 0.9999339

A little background information:

The function threw an error because the relationship between kurtosis and skewness wasn't invalid: kurtosis is lower-bounded by the skewness in the following way: kurtosis >= (skewness)^2 - 1. The proof ain't pretty and is certainly beyond the scope of the question, but you can check out the references below if you like for different versions of this inequality.

  1. Wilkins, J. Ernest. A Note on Skewness and Kurtosis. Ann. Math. Statist. 15 (1944), no. 3, 333--335. http://projecteuclid.org/euclid.aoms/1177731243.
  2. K. Pearson. Mathematical contributions to the theory of evolution, XIX; second supplement to a memoir on skew variation. Philos. Trans. Roy. Soc. London Ser. A, 216 (1916), p. 432 http://rsta.royalsocietypublishing.org/content/216/538-548/429
  3. Pearson, K. (1929). "Editorial note to 'Inequalities for moments of frequency functions and for various statistical constants'". Biometrika. 21 (1–4): 361–375. link
like image 102
KenHBS Avatar answered Nov 11 '22 14:11

KenHBS