Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Interpretation of .L, .Q., .C, .4… for logistic regression

Tags:

r

ordinal

I've done a good amount of googling and the explanations either don't make any sense or they say just use factors instead of ordinal data. I understand that the ``.Lis linear,.Q` is quadratic, ... etc. But I don't know how to actually say what it means. So for example let's say

Primary.L     7.73502       0.984
Primary.Q     6.81674       0.400
Primary.C     -4.07055      0.450
Primary^4     1.48845       0.600

where the first column is the variable, second is the estimate, and the third is the p-value. What would I be saying about the variables as they increase in order? Is this basically saying what model I would use so this would be 7.73502x + 6.81674x^2 - 4.07055x^3 is how the model is? Or would it just include quadratic? All of this is so confusing. If anyone could shine a light into how to interpret these .L, .Q, .C, etc., that would be fantastic.

example

> summary(glm(DEPENDENT ~ Year, data = HAVE, family = "binomial"))

Call:
glm(formula = DEPENDENT ~ Year, family = "binomial", data = HAVE)

Deviance Residuals: 
    Min       1Q   Median       3Q      Max  
-0.3376  -0.2490  -0.2155  -0.1635   3.1802  

Coefficients:
             Estimate Std. Error  z value Pr(>|z|)    
(Intercept) -3.572966   0.028179 -126.798  < 2e-16 ***
Year.L      -2.212443   0.150295  -14.721  < 2e-16 ***
Year.Q      -0.932844   0.162011   -5.758 8.52e-09 ***
Year.C       0.187344   0.156462    1.197   0.2312    
Year^4      -0.595352   0.147113   -4.047 5.19e-05 ***
Year^5      -0.027306   0.135214   -0.202   0.8400    
Year^6      -0.023756   0.120969   -0.196   0.8443    
Year^7       0.079723   0.111786    0.713   0.4757    
Year^8      -0.080749   0.103615   -0.779   0.4358    
Year^9      -0.117472   0.098423   -1.194   0.2327    
Year^10     -0.134956   0.095098   -1.419   0.1559    
Year^11     -0.106700   0.089791   -1.188   0.2347    
Year^12      0.102289   0.088613    1.154   0.2484    
Year^13      0.125736   0.084283    1.492   0.1357    
Year^14     -0.009941   0.084058   -0.118   0.9059    
Year^15     -0.173013   0.088781   -1.949   0.0513 .  
Year^16     -0.146597   0.090398   -1.622   0.1049    
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

(Dispersion parameter for binomial family taken to be 1)

    Null deviance: 18687  on 80083  degrees of freedom
Residual deviance: 18120  on 80067  degrees of freedom
AIC: 18154

Number of Fisher Scoring iterations: 7
like image 951
hubertsng Avatar asked Jul 31 '19 19:07

hubertsng


People also ask

How do you interpret ordered logistic regression in SPSS?

Standard interpretation of the ordered logit coefficient is that for a one unit increase in the predictor, the response variable level is expected to change by its respective regression coefficient in the ordered log-odds scale while the other variables in the model are held constant.

What is Polr R?

polr uses the standard formula interface in R for specifying a regression model with outcome followed by predictors. We also specify Hess=TRUE to have the model return the observed information matrix from optimization (called the Hessian) which is used to get standard errors.

How does multinomial logistic regression work?

Multinomial logistic regression is a simple extension of binary logistic regression that allows for more than two categories of the dependent or outcome variable. Like binary logistic regression, multinomial logistic regression uses maximum likelihood estimation to evaluate the probability of categorical membership.


1 Answers

That output indicates that your predictor Year is an "ordered factor" meaning R not only understands observations within that variable to be distinct categories or groups (i.e., a factor) but also that the various categories have a natural order to them where one category is considered larger than another.

In this situation, R's default is to fit a series of polynomial functions or contrasts to the levels of the variable. The first is linear (.L), the second is quadratic (.Q), the third is cubic (.C), and so on. R will fit one fewer polynomial functions than the number of available levels. Thus, your output indicates there are 17 distinct years in your data.

You can probably think of those 17 (counting the intercept) predictors in your output as entirely new variables all based on the order of your original variable because R creates them using special values that make all the new predictors orthogonal (i.e., unrelated, linearly independent, or uncorrelated) to each other.

One way to see the values that got used is to use the model.matrix() function on your model object.

model.matrix(glm(DEPENDENT ~ Year, data = HAVE, family = "binomial"))

If you run the above, you will find a bunch of repeated numbers within each of the new variable columns where the changes in repetition correspond to where your original Year predictor switched categories. The specific values themselves hold no real meaning to you because they were chosen/computed by R to make all the contrasts linearly independent of one another.

Therefore, your model in the R output would be:

logit(p) = -3.57 + -2.21 * Year.L + -0.93 * Year.Q + ... + -0.15 * Year^16

where p is the probability of presence of the characteristic of interest, and the logit transformation is defined as the logged odds where odds = p / (1 - p) and logged odds = ln(odds). Therefore logit(p) = ln(p / (1 - p)).

The interpretation of a particular beta test is then generalized to: Which contrasts contribute significantly to explain any differences between levels in your dependent variable? Because your Year.L predictor is significant and negative, this suggests a linear decreasing trend in logit across years, and because your Year.Q predictor is significant and negative, this suggests a deacceleration trend is detectable in the pattern of logits across years. Third order polynomials model jerk, and fourth order polynomials model jounce (a.k.a., snap). However, I would stop interpreting around this order and higher because it quickly becomes nonsensical to practical folk.

Similarly, to interpret a particular beta estimate is a bit nonsensical to me, but it would be that the odds of switching categories in your outcome at a given level of a particular contrast (e.g., quadratic) as compared to the odds of switching categories in your outcome at the given level of that contrast (e.g., quadratic) less one unit is equal to the odds ratio had by exponentiating the beta estimate. For the quadratic contrast in your example, the odds ratio would be exp(-0.9328) = 0.3935, but I say it's a bit nonsensical because the units have little practical meaning as they were chosen by R to make the predictors linearly independent from one another. Thus I prefer focusing on the interpretation of a given contrast test as opposed to the coefficient in this circumstance.

For further reading, here is a webpage at UCLA's wonderful IDRE that discusses how to interpret odds ratios in logistic regression, and here is a crazy cool but intense stack exchange answer that walks through how R chooses the polynomial contrast weights.

like image 181
the-mad-statter Avatar answered Sep 21 '22 05:09

the-mad-statter