Getting an error when using glmnet in Caret
Example below Load Libraries
library(dplyr)
library(caret)
library(C50)
Load churn data set from library C50
data(churn)
create x and y variables
churn_x <- subset(churnTest, select= -churn)
churn_y <- churnTest[[20]]
Use createFolds() to create 5 CV folds on churn_y, the target variable
myFolds <- createFolds(churn_y, k = 5)
Create trainControl object: myControl
myControl <- trainControl(
summaryFunction = twoClassSummary,
classProbs = TRUE, # IMPORTANT!
verboseIter = TRUE,
savePredictions = TRUE,
index = myFolds
)
Fit glmnet model: model_glmnet
model_glmnet <- train(
x = churn_x, y = churn_y,
metric = "ROC",
method = "glmnet",
trControl = myControl
)
Im getting the following error
Error in lognet(x, is.sparse, ix, jx, y, weights, offset, alpha, nobs, : NA/NaN/Inf in foreign function call (arg 5) In addition: Warning message: In lognet(x, is.sparse, ix, jx, y, weights, offset, alpha, nobs, : NAs introduced by coercion
I have checked and there are no missing values in the churn_x variables
sum(is.na(churn_x))
Does anyone know the answer?
The problem is in the model specification. If you use the caret train formula interface the training will work:
train <- data.frame(churn_x, churn_y)
model_glmnet <- train(churn_y ~ ., data = train,
metric = "ROC",
method = "glmnet",
trControl = myControl
)
> model_glmnet$results
alpha lambda ROC Sens Spec ROCSD SensSD SpecSD
1 0.10 0.0001754386 0.6958156 0.2845934 0.9123349 0.01855530 0.01616471 0.004002873
2 0.10 0.0017543858 0.7187303 0.2901986 0.9185721 0.01681286 0.01415863 0.005347573
3 0.10 0.0175438576 0.7399174 0.2355121 0.9487161 0.01482812 0.03932741 0.010769455
4 0.55 0.0001754386 0.6988285 0.2901800 0.9121614 0.01907845 0.01312159 0.004200233
5 0.55 0.0017543858 0.7260286 0.2946617 0.9185714 0.01761485 0.02171189 0.006755247
6 0.55 0.0175438576 0.7630039 0.2008939 0.9617103 0.01743847 0.03989938 0.006118592
7 1.00 0.0001754386 0.7009482 0.2924146 0.9119881 0.01958200 0.01233419 0.004157393
8 1.00 0.0017543858 0.7313495 0.2957728 0.9203040 0.01797853 0.02356945 0.008478577
9 1.00 0.0175438576 0.7672690 0.1595779 0.9760892 0.01935176 0.01935583 0.007938801
However when you specify x
and y
it will not work because glmnet takes the x
in the form of a model matrix, When you supply the formula to caret it will take care of model.matrix creation but if you just specify the x
and y
then it will assume x
is a model.matrix and will pass it to glmnet
. For instance this works:
x <- model.matrix(churn_y ~ ., data = train)
model_glmnet2 <- train(x = x, y = churn_y,
metric = "ROC",
method = "glmnet",
trControl = myControl
)
> model_glmnet2$results
alpha lambda ROC Sens Spec ROCSD SensSD SpecSD
1 0.10 0.0001754386 0.6958156 0.2845934 0.9123349 0.01855530 0.01616471 0.004002873
2 0.10 0.0017543858 0.7187303 0.2901986 0.9185721 0.01681286 0.01415863 0.005347573
3 0.10 0.0175438576 0.7399174 0.2355121 0.9487161 0.01482812 0.03932741 0.010769455
4 0.55 0.0001754386 0.6988285 0.2901800 0.9121614 0.01907845 0.01312159 0.004200233
5 0.55 0.0017543858 0.7260286 0.2946617 0.9185714 0.01761485 0.02171189 0.006755247
6 0.55 0.0175438576 0.7630039 0.2008939 0.9617103 0.01743847 0.03989938 0.006118592
7 1.00 0.0001754386 0.7009482 0.2924146 0.9119881 0.01958200 0.01233419 0.004157393
8 1.00 0.0017543858 0.7313495 0.2957728 0.9203040 0.01797853 0.02356945 0.008478577
9 1.00 0.0175438576 0.7672690 0.1595779 0.9760892 0.01935176 0.01935583 0.007938801
model.matrix
is needed only when there are factor features
If you want to use glmnet
and get the same error do this!
Short answer: using data.matrix()
fixed my issue!
Initially, I was doing:
# Given X and Y are datframes
cv.glmnet(x = as.matrix(X), y = as.matrix(Y), alpha = 1, family = "binomial")
This was fixed by:
cv.glmnet(x = data.matrix(X), y = as.matrix(Y), alpha = 1, family = "binomial")
Longer answer(not long at all):
I had the same problem, I was passing my X matrix using as.matrix()
which turns all elements of a data frame into a coercible type for all columns, if you happen to have factors in your data frame, as.matrix()
turns everything into a character. Usingdata.matrix()
fixed it for me. data.matrix()
can handle factors and ordered factor where as.matrix
is more basic.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With