Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Zero inflated poisson model fails to fit

Tags:

r

poisson

Here's the data and set up:

library(fitdistrplus)
library(gamlss)

finalVector <- dput(finalVector)
c(0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 
0, 0, 2, 1, 1, 1, 1, 1, 1, 1, 1, 2, 2, 2, 1, 2, 1, 1, 1, 1, 1, 
1, 1, 2, 1, 1, 2, 1, 1, 1, 1, 2, 1, 3, 2, 1, 1, 1, 1, 1, 1, 2, 
2, 1, 4, 2, 3, 1, 2, 3, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 
2, 1, 2, 2, 1, 1, 4, 1, 2, 2, 1, 1, 1, 1, 1, 2, 1, 1, 2, 2, 1, 
2, 1, 1, 4, 2, 2, 1, 1, 2, 1, 1, 1, 1, 1, 1)

countFitPoisson <- fitdist(finalVector,"pois",  method = "mle", lower = 0)
countFitZeroPoisson <- fitdist(finalVector, 'ZIP', start = list( ##mu  = mean of poisson, sigma = prob(x = 0))
                                                           mu = as.numeric(countFitPoisson$estimate), 
                                                            sigma = as.numeric(as.numeric(countFitPoisson$estimate))
                                                          ), method = "mle", lower= 0) 

The first call works successfully. The final says it failed to estimate and I'm not sure why. Thanks!

Edit:

Assuming I did the code correctly (not sure), then the only thing I can think of is that there are not enough zeros to fit the model?

like image 533
user1357015 Avatar asked Jun 24 '15 18:06

user1357015


People also ask

When should you use a zero-inflated model?

Zero-inflated Poisson regression is used to model count data that has an excess of zero counts. Further, theory suggests that the excess zeros are generated by a separate process from the count values and that the excess zeros can be modeled independently.

Does zero-inflation cause overdispersion?

Zero-inflation can cause overdispersion (but accounting for zero-inflation does not necessarily remove overdispersion). Two-part and mixture models for zero-inflated data (Table 11.1). Fundamental difference: In two-part models, the count process cannot produce zeros (the distribution is zero-truncated).

How do you know if data is zero-inflated?

If the amount of observed zeros is larger than the amount of predicted zeros, the model is underfitting zeros, which indicates a zero-inflation in the data.

What is the difference between zero-inflated and hurdle models?

Zero-inflated and hurdle models are generally used in the setting of excess zeroes. Zero-inflated models are typically used if the data contains excess structural and sampling zeroes, whereas hurdle models are generally used when there are only excess sampling zeroes.


1 Answers

Your data is not really zero-inflated, hence fitting the model does not lead to an improvement. Rather than using the fitdistr approach, I'm using glm and extended regression models below. All regression (sub-)models just use a constant (or intercept) without any real regressors, though. For visualization I use rootograms via the countreg package available from R-Forge (which contains the successor functions to the pscl count data regressions).

First, let's look at the Poisson fit:

(mp <- glm(finalVector ~ 1, family = poisson))
## Call:  glm(formula = finalVector ~ 1, family = poisson)
## 
## Coefficients:
## (Intercept)  
##      -0.284  
## 
## Degrees of Freedom: 181 Total (i.e. Null);  181 Residual
## Null Deviance:      200.2 
## Residual Deviance: 200.2        AIC: 418.3

This corresponds to a fitted mean of exp(-0.284), i.e., about 0.753. This fits the data very well if you compare observed and fitted frequencies:

library("countreg")
rootogram(mp)

This shows that the fit for the counts 0, 1, 2 is essentially perfect and there are only small deviations for 3, 4, 5 but these frequencies are extremely low anyway. So judging from this it appears that no extension of the model is necessary.

rootogram

But to formally compare the model with others, one could consider a zero-inflated Poisson (as you tried) a hurdle Poisson or a negative binomial. The zero-inflated model yields:

(mzip <- zeroinfl(finalVector ~ 1 | 1, dist = "poisson"))
## Call:
## zeroinfl(formula = finalVector ~ 1 | 1, dist = "poisson")
## 
## Count model coefficients (poisson with log link):
## (Intercept)  
##     -0.2839  
## 
## Zero-inflation model coefficients (binomial with logit link):
## (Intercept)  
##      -9.151  

Thus, the count mean is essentially identical to before and the probability of zero-inflation is essentially zero (plogis(-9.151) is about 0.01%).

The hurdle model works similarly but can use a zero-censored Poisson model for 0-vs-greater and a truncated Poisson for the positive counts. This is then also nested within the Poisson model and hence a Wald test can be carried easily out:

mhp <- hurdle(finalVector ~ 1 | 1, dist = "poisson", zero.dist = "poisson")
hurdletest(mhp)
## Wald test for hurdle models
## 
## Restrictions:
## count_((Intercept) - zero_(Intercept) = 0
## 
## Model 1: restricted model
## Model 2: finalVector ~ 1 | 1
## 
##   Res.Df Df Chisq Pr(>Chisq)
## 1    181                    
## 2    180  1 0.036     0.8495

This also clearly shows that there are no excess zeros and a simple Poisson model suffices.

As a final check one could also consider a negative binomial model:

(mnb <- glm.nb(finalVector ~ 1))
## Call:  glm.nb(formula = finalVector ~ 1, init.theta = 125.8922776, link = log)
## 
## Coefficients:
## (Intercept)  
##      -0.284  
## 
## Degrees of Freedom: 181 Total (i.e. Null);  181 Residual
## Null Deviance:      199.1 
## Residual Deviance: 199.1        AIC: 420.3

This has again virtually the same mean and a huge theta parameter that is close enough to infinity (= Poisson). Thus, overall the Poisson model is simply sufficient and there is no need for any of the extensions considered. The likelihoods are almost unchanged and the additional parameters (zero-inflation, zero-hurdle, theta-dispersion) do not yield any improvement:

AIC(mp, mzip, mhp, mnb)
##      df      AIC
## mp    1 418.2993
## mzip  2 420.2996
## mhp   2 420.2631
## mnb   2 420.2959
like image 131
Achim Zeileis Avatar answered Oct 07 '22 03:10

Achim Zeileis