Zero inflated poisson model fails to fit

Tags:

poisson

Here's the data and set up:

library(fitdistrplus)
library(gamlss)

finalVector <- dput(finalVector)
c(0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 
0, 0, 2, 1, 1, 1, 1, 1, 1, 1, 1, 2, 2, 2, 1, 2, 1, 1, 1, 1, 1, 
1, 1, 2, 1, 1, 2, 1, 1, 1, 1, 2, 1, 3, 2, 1, 1, 1, 1, 1, 1, 2, 
2, 1, 4, 2, 3, 1, 2, 3, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 
2, 1, 2, 2, 1, 1, 4, 1, 2, 2, 1, 1, 1, 1, 1, 2, 1, 1, 2, 2, 1, 
2, 1, 1, 4, 2, 2, 1, 1, 2, 1, 1, 1, 1, 1, 1)

countFitPoisson <- fitdist(finalVector,"pois",  method = "mle", lower = 0)
countFitZeroPoisson <- fitdist(finalVector, 'ZIP', start = list( ##mu  = mean of poisson, sigma = prob(x = 0))
                                                           mu = as.numeric(countFitPoisson$estimate), 
                                                            sigma = as.numeric(as.numeric(countFitPoisson$estimate))
                                                          ), method = "mle", lower= 0)

The first call works successfully. The final says it failed to estimate and I'm not sure why. Thanks!

Edit:

Assuming I did the code correctly (not sure), then the only thing I can think of is that there are not enough zeros to fit the model?

533

asked Jun 24 '15 18:06

1 Answers

Your data is not really zero-inflated, hence fitting the model does not lead to an improvement. Rather than using the fitdistr approach, I'm using glm and extended regression models below. All regression (sub-)models just use a constant (or intercept) without any real regressors, though. For visualization I use rootograms via the countreg package available from R-Forge (which contains the successor functions to the pscl count data regressions).

First, let's look at the Poisson fit:

(mp <- glm(finalVector ~ 1, family = poisson))
## Call:  glm(formula = finalVector ~ 1, family = poisson)
## 
## Coefficients:
## (Intercept)  
##      -0.284  
## 
## Degrees of Freedom: 181 Total (i.e. Null);  181 Residual
## Null Deviance:      200.2 
## Residual Deviance: 200.2        AIC: 418.3

This corresponds to a fitted mean of exp(-0.284), i.e., about 0.753. This fits the data very well if you compare observed and fitted frequencies:

library("countreg")
rootogram(mp)

This shows that the fit for the counts 0, 1, 2 is essentially perfect and there are only small deviations for 3, 4, 5 but these frequencies are extremely low anyway. So judging from this it appears that no extension of the model is necessary.

rootogram

But to formally compare the model with others, one could consider a zero-inflated Poisson (as you tried) a hurdle Poisson or a negative binomial. The zero-inflated model yields:

(mzip <- zeroinfl(finalVector ~ 1 | 1, dist = "poisson"))
## Call:
## zeroinfl(formula = finalVector ~ 1 | 1, dist = "poisson")
## 
## Count model coefficients (poisson with log link):
## (Intercept)  
##     -0.2839  
## 
## Zero-inflation model coefficients (binomial with logit link):
## (Intercept)  
##      -9.151

Thus, the count mean is essentially identical to before and the probability of zero-inflation is essentially zero (plogis(-9.151) is about 0.01%).

The hurdle model works similarly but can use a zero-censored Poisson model for 0-vs-greater and a truncated Poisson for the positive counts. This is then also nested within the Poisson model and hence a Wald test can be carried easily out:

mhp <- hurdle(finalVector ~ 1 | 1, dist = "poisson", zero.dist = "poisson")
hurdletest(mhp)
## Wald test for hurdle models
## 
## Restrictions:
## count_((Intercept) - zero_(Intercept) = 0
## 
## Model 1: restricted model
## Model 2: finalVector ~ 1 | 1
## 
##   Res.Df Df Chisq Pr(>Chisq)
## 1    181                    
## 2    180  1 0.036     0.8495

This also clearly shows that there are no excess zeros and a simple Poisson model suffices.

As a final check one could also consider a negative binomial model:

(mnb <- glm.nb(finalVector ~ 1))
## Call:  glm.nb(formula = finalVector ~ 1, init.theta = 125.8922776, link = log)
## 
## Coefficients:
## (Intercept)  
##      -0.284  
## 
## Degrees of Freedom: 181 Total (i.e. Null);  181 Residual
## Null Deviance:      199.1 
## Residual Deviance: 199.1        AIC: 420.3

This has again virtually the same mean and a huge theta parameter that is close enough to infinity (= Poisson). Thus, overall the Poisson model is simply sufficient and there is no need for any of the extensions considered. The likelihoods are almost unchanged and the additional parameters (zero-inflation, zero-hurdle, theta-dispersion) do not yield any improvement:

AIC(mp, mzip, mhp, mnb)
##      df      AIC
## mp    1 418.2993
## mzip  2 420.2996
## mhp   2 420.2631
## mnb   2 420.2959

131

answered Oct 07 '22 03:10

Achim Zeileis

Related questions
                            
                                R: setting multiple plot heights with par
                            
                                Is there something in R for automatic conversion of column( of data frame or table) into its original vector type
                            
                                Aggregating in R
                            
                                Interpolate values from a grid efficiently in R
                            
                                Overlay density plot excludes histogram values
                            
                                How to create a distance matrix for clustering using correlation instead of euclidean distance in R? [closed]
                            
                                R: Rearrange specific rows of single-column data frame into new column
                            
                                Printing too big legend box in RStudio
                            
                                In Scan EOF error while reading CSV file
                            
                                Flag all rows in a group using data table in R if some rows meet a condition
                            
                                Adding column of predicted Hazard Ratio to dataframe after Cox Regression in R
                            
                                Install some parts from Github when calling "install.packages()" in R
                            
                                Scatterplot of Year-On-Year Correlation of Data in R using ggplot2
                            
                                Creating compound/interacted dummy variables in data.table in R
                            
                                What is python's not? A special function type?
                            
                                Draw geom_smooth only for fits that are significant
                            
                                How to work with huge matrices in R? [closed]
                            
                                Export R package documentation to a web page
                            
                                Load dataset from "R" package using data(), assign it directly to a variable?
                            
                                copy a list of data.tables

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Zero inflated poisson model fails to fit

Tags:

r

poisson

user1357015

People also ask

1 Answers

Achim Zeileis

Recent Activity

Donate For Us