Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

glmnet error for logistic regression/binomial

I get this error when trying to fit glmnet() with family="binomial", for Logistic Regression fit:

> data <- read.csv("DAFMM_HE16_matrix.csv", header=F)
> x <- as.data.frame(data[,1:3])
> x <- model.matrix(~.,data=x)
> y <- data[,4]

> train=sample(1:dim(x)[1],287,replace=FALSE)

> xTrain=x[train,]
> xTest=x[-train,]
> yTrain=y[train]
> yTest=y[-train]

> fit = glmnet(xTrain,yTrain,family="binomial")

Error in lognet(x, is.sparse, ix, jx, y, weights, offset, alpha, nobs,  : 
one multinomial or binomial class has 1 or 0 observations; not allowed

Any help would be greatly appreciated - I've searched the internet and haven't been able to find anything that helps

EDIT:

Here's what data looks like:

> data
          V1       V2    V3      V4
1   34927.00   156.60 20321  -12.60
2   34800.00   156.60 19811  -18.68
3   29255.00   156.60 19068    7.50
4   25787.00   156.60 19608    6.16
5   27809.00   156.60 24863   -0.87
...
356 26495.00 12973.43 11802    6.35
357 26595.00 12973.43 11802   14.28
358 26574.00 12973.43 11802    3.98
359 25343.00 14116.18 11802   -2.05
like image 629
groutgauss Avatar asked May 01 '15 21:05

groutgauss


People also ask

What is CV Glmnet?

cv. glmnet() performs cross-validation, by default 10-fold which can be adjusted using nfolds. A 10-fold CV will randomly divide your observations into 10 non-overlapping groups/folds of approx equal size. The first fold will be used for validation set and the model is fit on 9 folds.

What is Lambda Glmnet?

Glmnet is a package that fits generalized linear and similar models via penalized maximum likelihood. The regularization path is computed for the lasso or elastic net penalty at a grid of values (on the log scale) for the regularization parameter lambda.


2 Answers

I think it is because of the levels of your factor variable. Suppose there are 10 levels and your 1 level has only one record, try to remove this level. You can use drop levels from gdata package.

like image 149
prahlad Avatar answered Sep 28 '22 04:09

prahlad


This is generally because of data structure and their response variable, sometimes the response has more than binary output. or the data response variable has binary out come, but they have much more one class from the other and we may called them most probably class imbalance problem. Therefore the problem then occur during training and testing the data. So, you must convert the response variable into binary if there are more than two outcomes, 2nd you may apply multinomial as respect to binomial. Hope this can help you.

like image 35
Muhammad Naeem Avatar answered Sep 28 '22 04:09

Muhammad Naeem