Here's the data and set up:
library(fitdistrplus)
library(gamlss)
finalVector <- dput(finalVector)
c(0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 2, 1, 1, 1, 1, 1, 1, 1, 1, 2, 2, 2, 1, 2, 1, 1, 1, 1, 1,
1, 1, 2, 1, 1, 2, 1, 1, 1, 1, 2, 1, 3, 2, 1, 1, 1, 1, 1, 1, 2,
2, 1, 4, 2, 3, 1, 2, 3, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
2, 1, 2, 2, 1, 1, 4, 1, 2, 2, 1, 1, 1, 1, 1, 2, 1, 1, 2, 2, 1,
2, 1, 1, 4, 2, 2, 1, 1, 2, 1, 1, 1, 1, 1, 1)
countFitPoisson <- fitdist(finalVector,"pois", method = "mle", lower = 0)
countFitZeroPoisson <- fitdist(finalVector, 'ZIP', start = list( ##mu = mean of poisson, sigma = prob(x = 0))
mu = as.numeric(countFitPoisson$estimate),
sigma = as.numeric(as.numeric(countFitPoisson$estimate))
), method = "mle", lower= 0)
The first call works successfully. The final says it failed to estimate and I'm not sure why. Thanks!
Edit:
Assuming I did the code correctly (not sure), then the only thing I can think of is that there are not enough zeros to fit the model?
Zero-inflated Poisson regression is used to model count data that has an excess of zero counts. Further, theory suggests that the excess zeros are generated by a separate process from the count values and that the excess zeros can be modeled independently.
Zero-inflation can cause overdispersion (but accounting for zero-inflation does not necessarily remove overdispersion). Two-part and mixture models for zero-inflated data (Table 11.1). Fundamental difference: In two-part models, the count process cannot produce zeros (the distribution is zero-truncated).
If the amount of observed zeros is larger than the amount of predicted zeros, the model is underfitting zeros, which indicates a zero-inflation in the data.
Zero-inflated and hurdle models are generally used in the setting of excess zeroes. Zero-inflated models are typically used if the data contains excess structural and sampling zeroes, whereas hurdle models are generally used when there are only excess sampling zeroes.
Your data is not really zero-inflated, hence fitting the model does not lead to an improvement. Rather than using the fitdistr
approach, I'm using glm
and extended regression models below. All regression (sub-)models just use a constant (or intercept) without any real regressors, though. For visualization I use rootograms via the countreg
package available from R-Forge (which contains the successor functions to the pscl
count data regressions).
First, let's look at the Poisson fit:
(mp <- glm(finalVector ~ 1, family = poisson))
## Call: glm(formula = finalVector ~ 1, family = poisson)
##
## Coefficients:
## (Intercept)
## -0.284
##
## Degrees of Freedom: 181 Total (i.e. Null); 181 Residual
## Null Deviance: 200.2
## Residual Deviance: 200.2 AIC: 418.3
This corresponds to a fitted mean of exp(-0.284)
, i.e., about 0.753. This fits the data very well if you compare observed and fitted frequencies:
library("countreg")
rootogram(mp)
This shows that the fit for the counts 0, 1, 2 is essentially perfect and there are only small deviations for 3, 4, 5 but these frequencies are extremely low anyway. So judging from this it appears that no extension of the model is necessary.
But to formally compare the model with others, one could consider a zero-inflated Poisson (as you tried) a hurdle Poisson or a negative binomial. The zero-inflated model yields:
(mzip <- zeroinfl(finalVector ~ 1 | 1, dist = "poisson"))
## Call:
## zeroinfl(formula = finalVector ~ 1 | 1, dist = "poisson")
##
## Count model coefficients (poisson with log link):
## (Intercept)
## -0.2839
##
## Zero-inflation model coefficients (binomial with logit link):
## (Intercept)
## -9.151
Thus, the count mean is essentially identical to before and the probability of zero-inflation is essentially zero (plogis(-9.151)
is about 0.01%).
The hurdle model works similarly but can use a zero-censored Poisson model for 0-vs-greater and a truncated Poisson for the positive counts. This is then also nested within the Poisson model and hence a Wald test can be carried easily out:
mhp <- hurdle(finalVector ~ 1 | 1, dist = "poisson", zero.dist = "poisson")
hurdletest(mhp)
## Wald test for hurdle models
##
## Restrictions:
## count_((Intercept) - zero_(Intercept) = 0
##
## Model 1: restricted model
## Model 2: finalVector ~ 1 | 1
##
## Res.Df Df Chisq Pr(>Chisq)
## 1 181
## 2 180 1 0.036 0.8495
This also clearly shows that there are no excess zeros and a simple Poisson model suffices.
As a final check one could also consider a negative binomial model:
(mnb <- glm.nb(finalVector ~ 1))
## Call: glm.nb(formula = finalVector ~ 1, init.theta = 125.8922776, link = log)
##
## Coefficients:
## (Intercept)
## -0.284
##
## Degrees of Freedom: 181 Total (i.e. Null); 181 Residual
## Null Deviance: 199.1
## Residual Deviance: 199.1 AIC: 420.3
This has again virtually the same mean and a huge theta parameter that is close enough to infinity (= Poisson). Thus, overall the Poisson model is simply sufficient and there is no need for any of the extensions considered. The likelihoods are almost unchanged and the additional parameters (zero-inflation, zero-hurdle, theta-dispersion) do not yield any improvement:
AIC(mp, mzip, mhp, mnb)
## df AIC
## mp 1 418.2993
## mzip 2 420.2996
## mhp 2 420.2631
## mnb 2 420.2959
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With