Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How do I use a custom link function in glm?

Tags:

r

glm

I don't want to use the standard log link in glm for Poisson regression, since I have zeros. Consider the following code:

foo = 0:10
bar = 2 * foo
glm(bar ~ foo, family = poisson(link = "identity"))

I get the error:

Error: no valid set of coefficients has been found: please supply starting values

I'm not certain what this means. Is the "identity" link function what I think it is (i.e. it doesn't transform the data at all)? What does this error mean and how can I resolve it?

like image 374
Xodarap Avatar asked Feb 17 '13 00:02

Xodarap


1 Answers

You can get an answer if you start somewhere other than the default (0,0) starting point. The start parameter is a vector containing the intercept and slope of the response, on the scale of the link function. The problem R is reporting is typically that the calculated (negative) log-likelihood becomes infinite for the starting values. You can check this for yourself: -sum(dpois(bar,0+0*foo,log=TRUE)) is Inf (because we are setting up a Poisson with zero mean, but get a non-zero response).

However, this isn't a complete explanation, because even for some starting points like (0,2) where the starting negative-log-likelihood is finite (-sum(dpois(bar,0+2*foo,log=TRUE)) is about 20), the same error occurs -- one would have to dig in deeper to see what's the matter, but I can imagine for instance that a Poisson mean of zero is not allowed at all in the code. The log-likelihood of the Poisson is (a constant plus) x*log(lambda)-lambda: even though this works out OK if lambda and x are both zero, that's not always obvious in the math. In particular, if you look at poisson()$validmu, which is the function that glm uses to establish whether a set of calculated means for the Poisson is OK, you'll see that its definition is function (mu) { all(mu > 0) }. (It would be possible to modify this to allow zero values for mu, but it would be enough trouble that you'd need a good reason to do so -- I tried it, and there's another problem because variances are then calculated to equal zero. In short, it would be easier to do this through a custom maximum likelihood estimator (e.g. bbmle::mle2()) than to hack glm to do it ...)

However, a starting point where there are no zero estimates of the Poisson mean works out fine, although there are plenty of warnings:

glm(bar ~ foo, family = poisson(link = "identity"), start=c(1,0))

However: I want to point out that you're misunderstanding the purpose of the link function. It's fine to have zeros in the response variable of a Poisson regression, even with a standard log link. The GLM model for a Poisson regression is y ~ Poisson(exp(a+b*x)), not log(y) = a + b*x. The latter is bad if y=0, but the former is perfectly OK. glm(bar ~ foo, family = poisson()) works just fine.

In general, non-canonical link functions are a bit of a pain: they're sometimes exactly what you need (although from what you've said I'm not convinced that this is true in your case), but they tend to be fussier and harder to fit than the canonical links.

One final note: I would probably refer to what you want as a "non-canonical" or "non-standard" link; a custom link function, for me, would be one that wasn't provided by the family() command in R, so you had to write the link function yourself (e.g. see http://rpubs.com/bbolker/4082 )

like image 156
Ben Bolker Avatar answered Sep 22 '22 06:09

Ben Bolker