Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

What does predict.glm(, type="terms") actually do?

I am confused with the way predict.glm function in R works. According to the help,

The "terms" option returns a matrix giving the fitted values of each term in the model formula on the linear predictor scale.

Thus, if my model has form f(y) = X*beta, then command

predict(model, X, type='terms')

is expected to produce the same matrix X, multiplied by beta element-wise. For example, if I train the following model

test.data = data.frame(y = c(0,0,0,1,1,1,1,1,1), x=c(1,2,3,1,2,2,3,3,3))
model = glm(y~(x==1)+(x==2), family = 'binomial', data = test.data)

the resulting coefficients are

beta <- model$coef

Design matrix is

X <- model.matrix(y~(x==1)+(x==2), data = test.data)

  (Intercept) x == 1TRUE x == 2TRUE
1           1          1          0
2           1          0          1
3           1          0          0
4           1          1          0
5           1          0          1
6           1          0          1
7           1          0          0
8           1          0          0
9           1          0          0

Then multiplied by coefficients it should look like

pred1 <- t(beta * t(X))

  (Intercept) x == 1TRUE x == 2TRUE
1    1.098612  -1.098612  0.0000000
2    1.098612   0.000000 -0.4054651
3    1.098612   0.000000  0.0000000
4    1.098612  -1.098612  0.0000000
5    1.098612   0.000000 -0.4054651
6    1.098612   0.000000 -0.4054651
7    1.098612   0.000000  0.0000000
8    1.098612   0.000000  0.0000000
9    1.098612   0.000000  0.0000000

However, actual matrix produced by predict.glm seems to be unrelated to this. The following code

pred2 <- predict(model, test.data, type = 'terms')

      x == 1     x == 2
1 -0.8544762  0.1351550
2  0.2441361 -0.2703101
3  0.2441361  0.1351550
4 -0.8544762  0.1351550
5  0.2441361 -0.2703101
6  0.2441361 -0.2703101
7  0.2441361  0.1351550
8  0.2441361  0.1351550
9  0.2441361  0.1351550
attr(,"constant")
[1] 0.7193212

How does one interpret such results?

like image 627
David Dale Avatar asked Jun 22 '16 09:06

David Dale


People also ask

What is the use of the predict method in GLM?

predict.glm: Predict Method for GLM Fits Description Obtains predictions and optionally estimates standard errors of those predictions from a fitted generalized linear model object.

What does the term option do in a GLM model?

The "terms" option returns a matrix giving the fitted values of each term in the model formula on the linear predictor scale. The value of this argument can be abbreviated. logical switch indicating if standard errors are required. the dispersion of the GLM fit to be assumed in computing the standard errors.

What is the return type of GLM terms?

the dispersion of the GLM fit to be assumed in computing the standard errors. If omitted, that returned by summary applied to the object is used. with type = "terms" by default all terms are returned. A character vector specifies which terms are to be returned

How do you use GLM in R?

The glm () function in R can be used to fit generalized linear models. This function is particularly useful for fitting logistic regression models, Poisson regression models, and other complex models. Once we’ve fit a model, we can then use the predict () function to predict the response value of a new observation.


1 Answers

I have already edited your question, to include "correct" way of getting (raw) model matrix, model coefficients, and your intended term-wise prediction. So your other question on how to get these are already solved. In the following, I shall help you understand predict.glm().


predict.glm() (actually, predict.lm()) has applied centring constraints for each model term when doing term-wise prediction.

Initially, you have a model matrix

X <- model.matrix(y~(x==1)+(x==2), data = test.data)

but it is centred, by dropping column means:

avx <- colMeans(X)
X1 <- sweep(X, 2L, avx)

> avx
(Intercept)  x == 1TRUE  x == 2TRUE 
  1.0000000   0.2222222   0.3333333 

> X1
  (Intercept) x == 1TRUE x == 2TRUE
1           0  0.7777778 -0.3333333
2           0 -0.2222222  0.6666667
3           0 -0.2222222 -0.3333333
4           0  0.7777778 -0.3333333
5           0 -0.2222222  0.6666667
6           0 -0.2222222  0.6666667
7           0 -0.2222222 -0.3333333
8           0 -0.2222222 -0.3333333
9           0 -0.2222222 -0.3333333

Then term-wise computation is done using this centred model matrix:

t(beta*t(X1))

  (Intercept) x == 1TRUE x == 2TRUE
1           0 -0.8544762  0.1351550
2           0  0.2441361 -0.2703101
3           0  0.2441361  0.1351550
4           0 -0.8544762  0.1351550
5           0  0.2441361 -0.2703101
6           0  0.2441361 -0.2703101
7           0  0.2441361  0.1351550
8           0  0.2441361  0.1351550
9           0  0.2441361  0.1351550

After centring, different terms are vertically shifted to have zero mean. As a result, intercept will be come 0. No worry, a new intercept is computed, by aggregating shifts of all model terms:

intercept <- as.numeric(crossprod(avx, beta))
# [1] 0.7193212

Now you should have seen what predict.glm(, type = "terms") gives you.

like image 97
Zheyuan Li Avatar answered Sep 20 '22 03:09

Zheyuan Li