I get this error when trying to fit glmnet() with family="binomial", for Logistic Regression fit:
> data <- read.csv("DAFMM_HE16_matrix.csv", header=F)
> x <- as.data.frame(data[,1:3])
> x <- model.matrix(~.,data=x)
> y <- data[,4]
> train=sample(1:dim(x)[1],287,replace=FALSE)
> xTrain=x[train,]
> xTest=x[-train,]
> yTrain=y[train]
> yTest=y[-train]
> fit = glmnet(xTrain,yTrain,family="binomial")
Error in lognet(x, is.sparse, ix, jx, y, weights, offset, alpha, nobs, :
one multinomial or binomial class has 1 or 0 observations; not allowed
Any help would be greatly appreciated - I've searched the internet and haven't been able to find anything that helps
EDIT:
Here's what data looks like:
> data
V1 V2 V3 V4
1 34927.00 156.60 20321 -12.60
2 34800.00 156.60 19811 -18.68
3 29255.00 156.60 19068 7.50
4 25787.00 156.60 19608 6.16
5 27809.00 156.60 24863 -0.87
...
356 26495.00 12973.43 11802 6.35
357 26595.00 12973.43 11802 14.28
358 26574.00 12973.43 11802 3.98
359 25343.00 14116.18 11802 -2.05
cv. glmnet() performs cross-validation, by default 10-fold which can be adjusted using nfolds. A 10-fold CV will randomly divide your observations into 10 non-overlapping groups/folds of approx equal size. The first fold will be used for validation set and the model is fit on 9 folds.
Glmnet is a package that fits generalized linear and similar models via penalized maximum likelihood. The regularization path is computed for the lasso or elastic net penalty at a grid of values (on the log scale) for the regularization parameter lambda.
I think it is because of the levels of your factor variable. Suppose there are 10 levels and your 1 level has only one record, try to remove this level. You can use drop levels from gdata
package.
This is generally because of data structure and their response variable, sometimes the response has more than binary output. or the data response variable has binary out come, but they have much more one class from the other and we may called them most probably class imbalance problem. Therefore the problem then occur during training and testing the data. So, you must convert the response variable into binary if there are more than two outcomes, 2nd you may apply multinomial
as respect to binomial
. Hope this can help you.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With