Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

R: NA/NaN/Inf in X error

Tags:

r

I am trying to perform a negative binomial regression using R. When I am executing the following command:

 DV2.25112013.nb <- glm.nb(DV2.25112013~ Bcorp.Geographic.Proximity + Dirty.Industry +
                Clean.Industry + Bcorp.Industry.Density + State + Dirty.Region +
                Clean.Region + Bcorp.Geographic.Density + Founded.As.Bcorp + Centrality +
                Bcorp.Industry.Density.Squared + Bcorp.Geographic.Density.Squared +
                Regional.Institutionalization + Sales + Any.Best.In.Class +           
                Dirty.Region.Heterogeneity + Clean.Region.Heterogeneity + 
                Ind.Dirty.Heterogeneity+Ind.Clean.Heterogeneity + Industry, 
                data = analysis25112013DF6)

R gives the following error:

Error in glm.fitter(x = X, y = Y, w = w, etastart = eta, offset = offset,  : 
  NA/NaN/Inf in 'x'
In addition: Warning message:
step size truncated due to divergence 

I do not understand this error since my data matrix does not contain any NA/NaN/Inf values...how can I fix this?

thank you,

like image 912
Jin-Dominique Avatar asked Dec 27 '13 23:12

Jin-Dominique


2 Answers

I think the most likely cause of this error are negative values or zeros in the data, since the default link in glm.nb is 'log'. It would be easy enough to test by changing link="identity". I also think you need to try smaller models .... maybe a quarter of those variables to start. That also lets you add related variables as bundles since it looks from the names that you have possibly severe potential for collinearity with categorical variables.

We really need a data description. I wondered about Dirty.Industry + Clean.Industry. That is the sort of dichotomy that is better handled with a factor variable that has those levels. That prevents the collinearity if Clean = not-Dirty. Perhaps similarly with your "Heterogeneity" variables. (I'm not convinced that @BenBolker's comment is correct. I think it very possible that you first need statistical consultation before address coding issues.)

require(MASS)
data(quine)  # following example in ?glm.nb page

> quine$Days[1] <- -2

> quine.nb1 <- glm.nb(Days ~ Sex/(Age + Eth*Lrn), data = quine, link = "identity")
Error in eval(expr, envir, enclos) : 
  negative values not allowed for the 'Poisson' family

> quine$Days[1] <- 0
> quine.nb1 <- glm.nb(Days ~ Sex/(Age + Eth*Lrn), data = quine, link = "identity")
Error: no valid set of coefficients has been found: please supply starting values
In addition: Warning message:
In log(y/mu) : NaNs produced
like image 102
IRTFM Avatar answered Oct 13 '22 07:10

IRTFM


i have resolved this issue by putting in the control argument into the model assumptions with maxiter=10 or lower. the default is 50 iterations. perhaps it works for you with a little more iterations. just try

like image 36
Raul Avatar answered Oct 13 '22 08:10

Raul