Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

R - cv.glmnet error: matrices must have same number of columns

Tags:

r

glmnet

Running R cv.glmnet function from glmnet package with large sparse datasets I often get following error:

# Error: Matrices must have same number of columns in .local(x, y, ...)

I have replicated the error with randomly generated data:

set.seed(10)

X <- matrix(rbinom(5000, 1, 0.1), nrow=1000, ncol=5)
X[, 1] <- 0
X[1, 1] <- 1

Y <- rep(0, 1000)
Y[c(1:20)] <- 1

model <- cv.glmnet(x=X, y=Y, family="binomial", alpha=0.9, standardize=T, 
                   nfolds=4)

This might be related to initial variable screening (based on inner product of X and Y). Instead of fixing coefficient to zero glmnet drops the variable from X matrix and this is done for each of the validation sets. Then if variable is dropped in some of them and kept in others the error appears.

Sometimes increasing nfolds helps. Which is in line with hypothesis as higher number of nfolds means larger validation subsets and smaller chance of dropping the variable in any of them.

A few additional notes:

Error appears only for alpha close to 1 (alpha=1 is equivalent to L1 regularization) and using standardization. It does not appear for family="Gaussian".

What do you think could be happening?

like image 330
Vainius Avatar asked Mar 14 '14 11:03

Vainius


1 Answers

This example is problematic, because one variable has a single 1 and the rest are zero. This is a case where logistic regression can diverge (if not regularized), since driving that coefficient to infinity (plus or minus depending on the response) will predict that observation perfectly, and not impact anything else.

Now the model is regularized, so this should not happen, but it does cause problems. I found by making alpha smaller (toward ridge, .5 for this example) the problem went away.

The real problem here is to do with the lambda sequence used for each fold, but this gets a little technical. I will try and make a fix to cv.glmnet that makes this problem go away.

Trevor Hastie (glmnet maintainer)

like image 191
Trevor Hastie Avatar answered Nov 20 '22 09:11

Trevor Hastie