So I'm using R to do logistic regression, but I'm using offsets.
mylogit <- glm(Y ~ X1 + offset(0.2*X2) + offset(0.4*X3), data = test, family = "binomial")
The output, shows only a single coefficient, the intercept and one of the predictors, X1.
Coefficients:
(Intercept) X1
0.5250748 0.0157259
My question: How do i get the raw prediction from each observation from this model? More specifically, if I use the predict function, will it include all the features and their coefficients, even though the model coefficients are listed as only containing the intercept and X1?
prob = predict(mylogit,test,type=c("response"))
Do I have to use the predict function? Does the "mylogit" object contain anything I can compute directly from? (yes I looked at the documentation on glm, still confused).
thank you for your patients.
I can report the results of some experiments with glm
and offset()
. It does not appear (at least from this experiment) that your call to predict
will give results that take the offset
into account. Rather it seems that summary.glm
is needed for that purpose. I started with a rather mangled modification of the 1st example in ?glm
( and this would be more pertinent to your concerns if you did provide data, because then we could play around more with the newdata argument that you would need for "test".)
counts <- c(18,17,15,20,10,20,25,13,12)
outcome <- gl(3,1,9)
treatment <- gl(3,3)
print(d.AD <- data.frame(treatment, outcome, counts))
glm.D93 <- glm(counts ~ outcome + treatment + offset(1:9), family = poisson())
glm.D93d <- glm(counts ~ outcome + treatment , family = poisson())
> predict(glm.D93d, type="response")
1 2 3 4 5 6 7 8 9
21.00000 13.33333 15.66667 21.00000 13.33333 15.66667 21.00000 13.33333 15.66667
> predict(glm.D93, type="response")
1 2 3 4 5 6 7 8 9
21.00000 13.33333 15.66667 21.00000 13.33333 15.66667 21.00000 13.33333 15.66667
As far as I can tell the offset
is only apparent when comparisons of the estimated coefficients are made to the NULL estimate (usually 0) for the purposes of statistical inference. That is done by summary.glm
:
> summary(glm.D93)$coef
Estimate Std. Error z value Pr(>|z|)
(Intercept) 2.044522 0.1708987 11.963362 5.527764e-33
outcome2 -1.454255 0.2021708 -7.193203 6.328878e-13
outcome3 -2.292987 0.1927423 -11.896644 1.232021e-32
treatment2 -3.000000 0.2000000 -15.000000 7.341915e-51
treatment3 -6.000000 0.2000000 -30.000000 9.813361e-198
> summary(glm.D93d)$coef
Estimate Std. Error z value Pr(>|z|)
(Intercept) 3.044522e+00 0.1708987 1.781478e+01 5.426767e-71
outcome2 -4.542553e-01 0.2021708 -2.246889e+00 2.464711e-02
outcome3 -2.929871e-01 0.1927423 -1.520097e+00 1.284865e-01
treatment2 1.337909e-15 0.2000000 6.689547e-15 1.000000e+00
treatment3 1.421085e-15 0.2000000 7.105427e-15 1.000000e+00
The offset is only changing the reference levels (with fairly bizarre changes in this mangled example) while the fitting of $linear.predictors
and $fitted
to the data is not affected. I didn't see a comment in glm that affects this but there is a comment in ?lm
: "Offsets specified by offset will not be included in predictions by predict.lm, whereas those specified by an offset term in the formula will be." I will admit that I got very little insight from reading ?model.offset
.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With