Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to interpret coefficients of logistic regression

I'm trying to figure out how the coefficients of logistic regression with a polynomial term relate to predictions. Specifically, I'm interested in the location on the x-axis where the prediction is highest. Example below:

set.seed(42)

# Setup some dummy data
x <- 1:200
y <- rep(0, length(x))
y[51:150] <- rbinom(100, 1, 0.5)

# Fit a model
family <- binomial()
model  <- glm(y ~ poly(x, 2), family = family)

# Illustrate model
plot(x, y)
lines(x, family$linkinv(predict(model)), col = 2)

The model above gives me these coefficients:

coef(model)
#> (Intercept) poly(x, 2)1 poly(x, 2)2 
#>   -1.990317   -3.867855  -33.299893

Created on 2021-08-03 by the reprex package (v1.0.0)

The manual page for poly() states the following:

The orthogonal polynomial is summarized by the coefficients, which can be used to evaluate it via the three-term recursion given in Kennedy & Gentle (1980, pp. 343–4), and used in the predict part of the code.

However, I don't have access to the book, nor am I able to discern from the predict.glm S3 method how these coefficients are handled. Is there a way to reconstruct the location of the summit (around 100 in the example) from the coefficients alone (i.e. without using predict() to find the maximum)?

like image 889
teunbrand Avatar asked Oct 20 '25 09:10

teunbrand


1 Answers

Assuming you want to find the maximum of the prediction analytically for this particular case where the orthogonal polynomials are of order 1 and 2, I propose the following approach:

SUMMARY

1) Infer the polynomial coefficients

This can easily be done by fitting a linear model to the respective polynomial values contained in the model matrix.

2) Derive the prediction expression w.r.t. x and set the derivative to 0

Solve for x in the prediction expression inferred from the polynomial fit in (1) and obtain the value of x at which the prediction's maximum occurs.

DETAILS

1) Polynomial coefficients

Following from the line where you fit the GLM model, we estimate the coefficients for the polynomial of order 1, p1(x) = a0 + a1*x, and the coefficients for the polynomial of order 2, p2(x) = b0 + b1*x + b2*x^2:

X = model.matrix(model)
p1 = X[, "poly(x, 2)1"]
p2 = X[, "poly(x, 2)2"]

p1.lm = lm(p1 ~ x)
a0 = p1.lm$coeff["(Intercept)"]
a1 = p1.lm$coeff["x"]

p2.lm = lm(p2 ~ x + I(x^2))
b0 = p2.lm$coeff["(Intercept)"]
b1 = p2.lm$coeff["x"]
b2 = p2.lm$coeff["I(x^2)"]

This gives:

> c(a0, a1, b0, b1, b2)
   (Intercept)              x    (Intercept)              x         I(x^2) 
-1.2308840e-01  1.2247602e-03  1.6050353e-01 -4.7674315e-03  2.3718565e-05 

2) Derivative of the prediction to find the maximum

The expression for the prediction, z, (before the inverse link function) is:

z = Intercept + coef1 * p1(x) + coef2 * p2(x)

We derive this expression and set it to 0 to obtain:

coef1 * a1 + coef2 * (b1 + 2 * b2 * xmax) = 0

Solving for xmax we get:

xmax = - (coef1 * a1 + coef2 * b1) / (2 * coef2 * b2)

In R code, this is computed as:

coef1 = as.numeric( model$coeff["poly(x, 2)1"] )
coef2 = as.numeric( model$coeff["poly(x, 2)2"] )
(xmax = - ( coef1 * a1 + coef2 * b1 ) / (2 * coef2 * b2))

which gives:

        x 
97.501114

CHECK

We can verify the maximum by adding it to the prediction's curve as a green cross:

# Prediction curve computed analytically
Intercept = model$coeff["(Intercept)"]
pred.analytical = family$linkinv( Intercept + coef1 * p1 + coef2 * p2 )

# Find the prediction's maximum analytically
pred.max = family$linkinv( Intercept + coef1 * (a0 + a1 * xmax) +
                                       coef2 * (b0 + b1 * xmax + b2 * xmax^2) )


# Plot
plot(x, y)
# The following two lines should coincide!
lines(x, pred.analytical, col = 3)
lines(x, family$linkinv(predict(model)), col = 2)
# Location of the maximum!
points(xmax, pred.max, pch="x", col="green")

which gives:

enter image description here

like image 130
mastropi Avatar answered Oct 21 '25 23:10

mastropi