I am confused with the way predict.glm function in R works. According to the help,
The "terms" option returns a matrix giving the fitted values of each term in the model formula on the linear predictor scale.
Thus, if my model has form f(y) = X*beta, then command
predict(model, X, type='terms')
is expected to produce the same matrix X, multiplied by beta element-wise. For example, if I train the following model
test.data = data.frame(y = c(0,0,0,1,1,1,1,1,1), x=c(1,2,3,1,2,2,3,3,3))
model = glm(y~(x==1)+(x==2), family = 'binomial', data = test.data)
the resulting coefficients are
beta <- model$coef
Design matrix is
X <- model.matrix(y~(x==1)+(x==2), data = test.data)
(Intercept) x == 1TRUE x == 2TRUE
1 1 1 0
2 1 0 1
3 1 0 0
4 1 1 0
5 1 0 1
6 1 0 1
7 1 0 0
8 1 0 0
9 1 0 0
Then multiplied by coefficients it should look like
pred1 <- t(beta * t(X))
(Intercept) x == 1TRUE x == 2TRUE
1 1.098612 -1.098612 0.0000000
2 1.098612 0.000000 -0.4054651
3 1.098612 0.000000 0.0000000
4 1.098612 -1.098612 0.0000000
5 1.098612 0.000000 -0.4054651
6 1.098612 0.000000 -0.4054651
7 1.098612 0.000000 0.0000000
8 1.098612 0.000000 0.0000000
9 1.098612 0.000000 0.0000000
However, actual matrix produced by predict.glm
seems to be unrelated to this. The following code
pred2 <- predict(model, test.data, type = 'terms')
x == 1 x == 2
1 -0.8544762 0.1351550
2 0.2441361 -0.2703101
3 0.2441361 0.1351550
4 -0.8544762 0.1351550
5 0.2441361 -0.2703101
6 0.2441361 -0.2703101
7 0.2441361 0.1351550
8 0.2441361 0.1351550
9 0.2441361 0.1351550
attr(,"constant")
[1] 0.7193212
How does one interpret such results?
predict.glm: Predict Method for GLM Fits Description Obtains predictions and optionally estimates standard errors of those predictions from a fitted generalized linear model object.
The "terms" option returns a matrix giving the fitted values of each term in the model formula on the linear predictor scale. The value of this argument can be abbreviated. logical switch indicating if standard errors are required. the dispersion of the GLM fit to be assumed in computing the standard errors.
the dispersion of the GLM fit to be assumed in computing the standard errors. If omitted, that returned by summary applied to the object is used. with type = "terms" by default all terms are returned. A character vector specifies which terms are to be returned
The glm () function in R can be used to fit generalized linear models. This function is particularly useful for fitting logistic regression models, Poisson regression models, and other complex models. Once we’ve fit a model, we can then use the predict () function to predict the response value of a new observation.
I have already edited your question, to include "correct" way of getting (raw) model matrix, model coefficients, and your intended term-wise prediction. So your other question on how to get these are already solved. In the following, I shall help you understand predict.glm()
.
predict.glm()
(actually, predict.lm()
) has applied centring constraints for each model term when doing term-wise prediction.
Initially, you have a model matrix
X <- model.matrix(y~(x==1)+(x==2), data = test.data)
but it is centred, by dropping column means:
avx <- colMeans(X)
X1 <- sweep(X, 2L, avx)
> avx
(Intercept) x == 1TRUE x == 2TRUE
1.0000000 0.2222222 0.3333333
> X1
(Intercept) x == 1TRUE x == 2TRUE
1 0 0.7777778 -0.3333333
2 0 -0.2222222 0.6666667
3 0 -0.2222222 -0.3333333
4 0 0.7777778 -0.3333333
5 0 -0.2222222 0.6666667
6 0 -0.2222222 0.6666667
7 0 -0.2222222 -0.3333333
8 0 -0.2222222 -0.3333333
9 0 -0.2222222 -0.3333333
Then term-wise computation is done using this centred model matrix:
t(beta*t(X1))
(Intercept) x == 1TRUE x == 2TRUE
1 0 -0.8544762 0.1351550
2 0 0.2441361 -0.2703101
3 0 0.2441361 0.1351550
4 0 -0.8544762 0.1351550
5 0 0.2441361 -0.2703101
6 0 0.2441361 -0.2703101
7 0 0.2441361 0.1351550
8 0 0.2441361 0.1351550
9 0 0.2441361 0.1351550
After centring, different terms are vertically shifted to have zero mean. As a result, intercept will be come 0. No worry, a new intercept is computed, by aggregating shifts of all model terms:
intercept <- as.numeric(crossprod(avx, beta))
# [1] 0.7193212
Now you should have seen what predict.glm(, type = "terms")
gives you.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With