I hope this is not too naive of a question.
I am performing a series of binomial regressions with different models in the caret
package in R. All are working so far except for earth (MARS). Typically, the earth
family is passed to the glm
function through the earth
function as glm=list(family=binomial)
. This seems to be working ok (as evident below). For the general predict()
function, I would use the type="response'
to properly scale the prediction. The examples below show the non-caret approach in fit1
with the correct prediction in pred1
. pred1a
is the improperly scaled prediction without type='response'
. fit2
is the approach with caret
and pred2
is the prediction; it is the same as the non-scaled prediction in pred1a
. Digging through the fit2
object, the properly fitted values are present in the glm.list
component. Therefore, the earth()
function is behaving as it should.
The question is... since the caret
prediction()
function only takes type='prob' or 'raw'
, how can I instruct is to predict on the scale of the response?
Thank you very much.
require(earth)
library(caret)
data(mtcars)
fit1 <- earth(am ~ cyl + mpg + wt + disp, data = mtcars,
degree=1, glm=list(family=binomial))
pred1 <- predict(fit1, newdata = mtcars, type="response")
range(pred1)
[1] 0.0004665284 0.9979135993 # Correct - binomial with response
pred1a <- predict(fit1, newdata = mtcars)
range(pred1a)
[1] -7.669725 6.170226 # without "response"
fit2ctrl <- trainControl(method = "cv", number = 5)
fit2 <- train(am ~ cyl + mpg + wt + disp, data = mtcars, method = "earth",
trControl = fit2ctrl, tuneLength = 3,
glm=list(family='binomial'))
pred2 <- predict(fit2, newdata = mtcars)
range(pred2)
[1] -7.669725 6.170226 # same as pred1a
#within glm.list object in fit4
[1] 0.0004665284 0.9979135993
There are a few things:
mtcars$am
) is numeric 0/1 and train
will treat this as a regression modeltrain
will assume classification and will automatically add glm=list(family=binomial)
train
, you will need to add classProbs = TRUE
to trainControl
for the model to produce class probabilities.Here is an example with a different data set in the earth
package:
library(earth)
library(caret)
data(etitanic)
a1 <- earth(survived ~ .,
data = etitanic,
glm=list(family=binomial),
degree = 2,
nprune = 5)
etitanic$survived <- factor(ifelse(etitanic$survived == 1, "yes", "no"),
levels = c("yes", "no"))
a2 <- train(survived ~ .,
data = etitanic,
method = "earth",
tuneGrid = data.frame(degree = 2, nprune = 5),
trControl = trainControl(method = "none",
classProbs = TRUE))
then:
> predict(a1, head(etitanic), type = "response")
survived
[1,] 0.8846552
[2,] 0.9281010
[3,] 0.8846552
[4,] 0.4135716
[5,] 0.8846552
[6,] 0.4135716
>
> predict(a2, head(etitanic), type = "prob")
yes no
1 0.8846552 0.11534481
2 0.9281010 0.07189895
3 0.8846552 0.11534481
4 0.4135716 0.58642840
5 0.8846552 0.11534481
6 0.4135716 0.58642840
Max
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With