Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to specify log link in glmnet?

Tags:

r

glmnet

I'm running an elastic net on a generalized linear model with the glmnet and caret packages in R.

My response variable is cost (where cost > $0) and hence I'd like to specify a Gaussian family with a log link for my GLM. However glmnet doesn't seem to allow me to specify (link="log") as follows:

> lasso_fit <- glmnet(x, y, alpha=1, family="gaussian"(link="log"), lambda.min.ratio=.001)

I've tried different variants, with and without quotations, but no luck. The glmnet documentation doesn't discuss how to include a log link.

Am I missing something? Does family="gaussian" already implicitly assume a log link?

like image 208
RobertF Avatar asked Aug 08 '14 15:08

RobertF


1 Answers

It is a bit confusing, but the family argument in glmnet and glm are quite different. In glm, you can specify a character like "gaussian", or you can specify a function with some arguments, like gaussian(link="log"). In glmnet, you can only specify the family with a character, like "gaussian", and there is no way to automatically set the link through that argument.

The default link for gaussian is the identity function, that is, no transformation at all. But, remember that a link function is just a transformation of your y variable; you can just specify it yourself:

glmnet(x, log(y), family="gaussian")

Also note that the default link for the poisson family is log, but the objective function will change. See the Details under ?glmnet in the first couple of paragraphs.


Your comments have led me to rethink my answer; I have evidence that it is not correct.

As you point out, there is a difference between E[log(Y)] and log(E[Y]). I think what the above code does is to fit E[log(Y)], which is not what you want. Here is some code to generate data and confirm what you noted in the comments:

# Generate data
set.seed(1)
x <- replicate(3,runif(1000))
y <- exp(2*x[,1] + 3*x[,2] + x[,3] + runif(1000))
df <- data.frame(y,x)

# Run the model you *want*
glm(y~., family=gaussian(link="log"), data=df)$coef
# (Intercept)          X1          X2          X3 
#   0.4977746   2.0449443   3.0812333   0.9451073 

# Run the model you *don't want* (in two ways)    
glm(log(y)~., family=gaussian(link='identity'), data=df)$coef
# (Intercept)          X1          X2          X3 
#   0.4726745   2.0395798   3.0167274   0.9957110 
lm(log(y)~.,data=df)$coef
# (Intercept)          X1          X2          X3 
#   0.4726745   2.0395798   3.0167274   0.9957110 

# Run the glmnet code that I suggested - getting what you *don't want*.
library(glmnet)
glmnet.model <- glmnet(x,log(y),family="gaussian", thresh=1e-8, lambda=0)
c(glmnet.model$a0, glmnet.model$beta[,1])
#        s0        V1        V2        V3 
# 0.4726745 2.0395798 3.0167274 0.9957110 
like image 62
nograpes Avatar answered Nov 15 '22 18:11

nograpes