Interpretation of .L, .Q., .C, .4… for logistic regression

Tags:

I've done a good amount of googling and the explanations either don't make any sense or they say just use factors instead of ordinal data. I understand that the ``.Lis linear,.Q` is quadratic, ... etc. But I don't know how to actually say what it means. So for example let's say

Primary.L     7.73502       0.984
Primary.Q     6.81674       0.400
Primary.C     -4.07055      0.450
Primary^4     1.48845       0.600

where the first column is the variable, second is the estimate, and the third is the p-value. What would I be saying about the variables as they increase in order? Is this basically saying what model I would use so this would be 7.73502x + 6.81674x^2 - 4.07055x^3 is how the model is? Or would it just include quadratic? All of this is so confusing. If anyone could shine a light into how to interpret these .L, .Q, .C, etc., that would be fantastic.

example

> summary(glm(DEPENDENT ~ Year, data = HAVE, family = "binomial"))

Call:
glm(formula = DEPENDENT ~ Year, family = "binomial", data = HAVE)

Deviance Residuals: 
    Min       1Q   Median       3Q      Max  
-0.3376  -0.2490  -0.2155  -0.1635   3.1802  

Coefficients:
             Estimate Std. Error  z value Pr(>|z|)    
(Intercept) -3.572966   0.028179 -126.798  < 2e-16 ***
Year.L      -2.212443   0.150295  -14.721  < 2e-16 ***
Year.Q      -0.932844   0.162011   -5.758 8.52e-09 ***
Year.C       0.187344   0.156462    1.197   0.2312    
Year^4      -0.595352   0.147113   -4.047 5.19e-05 ***
Year^5      -0.027306   0.135214   -0.202   0.8400    
Year^6      -0.023756   0.120969   -0.196   0.8443    
Year^7       0.079723   0.111786    0.713   0.4757    
Year^8      -0.080749   0.103615   -0.779   0.4358    
Year^9      -0.117472   0.098423   -1.194   0.2327    
Year^10     -0.134956   0.095098   -1.419   0.1559    
Year^11     -0.106700   0.089791   -1.188   0.2347    
Year^12      0.102289   0.088613    1.154   0.2484    
Year^13      0.125736   0.084283    1.492   0.1357    
Year^14     -0.009941   0.084058   -0.118   0.9059    
Year^15     -0.173013   0.088781   -1.949   0.0513 .  
Year^16     -0.146597   0.090398   -1.622   0.1049    
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

(Dispersion parameter for binomial family taken to be 1)

    Null deviance: 18687  on 80083  degrees of freedom
Residual deviance: 18120  on 80067  degrees of freedom
AIC: 18154

Number of Fisher Scoring iterations: 7

951

asked Jul 31 '19 19:07

hubertsng

1 Answers

That output indicates that your predictor Year is an "ordered factor" meaning R not only understands observations within that variable to be distinct categories or groups (i.e., a factor) but also that the various categories have a natural order to them where one category is considered larger than another.

In this situation, R's default is to fit a series of polynomial functions or contrasts to the levels of the variable. The first is linear (.L), the second is quadratic (.Q), the third is cubic (.C), and so on. R will fit one fewer polynomial functions than the number of available levels. Thus, your output indicates there are 17 distinct years in your data.

You can probably think of those 17 (counting the intercept) predictors in your output as entirely new variables all based on the order of your original variable because R creates them using special values that make all the new predictors orthogonal (i.e., unrelated, linearly independent, or uncorrelated) to each other.

One way to see the values that got used is to use the model.matrix() function on your model object.

model.matrix(glm(DEPENDENT ~ Year, data = HAVE, family = "binomial"))

If you run the above, you will find a bunch of repeated numbers within each of the new variable columns where the changes in repetition correspond to where your original Year predictor switched categories. The specific values themselves hold no real meaning to you because they were chosen/computed by R to make all the contrasts linearly independent of one another.

Therefore, your model in the R output would be:

logit(p) = -3.57 + -2.21 * Year.L + -0.93 * Year.Q + ... + -0.15 * Year^16

where p is the probability of presence of the characteristic of interest, and the logit transformation is defined as the logged odds where odds = p / (1 - p) and logged odds = ln(odds). Therefore logit(p) = ln(p / (1 - p)).

The interpretation of a particular beta test is then generalized to: Which contrasts contribute significantly to explain any differences between levels in your dependent variable? Because your Year.L predictor is significant and negative, this suggests a linear decreasing trend in logit across years, and because your Year.Q predictor is significant and negative, this suggests a deacceleration trend is detectable in the pattern of logits across years. Third order polynomials model jerk, and fourth order polynomials model jounce (a.k.a., snap). However, I would stop interpreting around this order and higher because it quickly becomes nonsensical to practical folk.

Similarly, to interpret a particular beta estimate is a bit nonsensical to me, but it would be that the odds of switching categories in your outcome at a given level of a particular contrast (e.g., quadratic) as compared to the odds of switching categories in your outcome at the given level of that contrast (e.g., quadratic) less one unit is equal to the odds ratio had by exponentiating the beta estimate. For the quadratic contrast in your example, the odds ratio would be exp(-0.9328) = 0.3935, but I say it's a bit nonsensical because the units have little practical meaning as they were chosen by R to make the predictors linearly independent from one another. Thus I prefer focusing on the interpretation of a given contrast test as opposed to the coefficient in this circumstance.

For further reading, here is a webpage at UCLA's wonderful IDRE that discusses how to interpret odds ratios in logistic regression, and here is a crazy cool but intense stack exchange answer that walks through how R chooses the polynomial contrast weights.

181

answered Sep 21 '22 05:09

the-mad-statter

Related questions
                            
                                Calculate within categories: Equivalent of R's ddply in Python?
                            
                                Convert Python to R
                            
                                How to overlay two geom_bar?
                            
                                R xts and data.table
                            
                                Constructing 3D array in Rcpp
                            
                                How do I concatenate String and an output evaluated from a function in R?
                            
                                Cartesian join in data.table
                            
                                Replace NA with previous and next rows mean in R
                            
                                Plot map with values for countries as color in R?
                            
                                How to have only every other border in a persp
                            
                                Combine multiple data frames and calculate average
                            
                                Why does dplyr's mutate() change the time format?
                            
                                Merge multiple data tables with duplicate column names
                            
                                How to use fread() as readLines() without auto column detection?
                            
                                How to divide between groups of rows using dplyr?
                            
                                Fitting a normal distribution in R
                            
                                ggplot: Multiple years on same plot by month
                            
                                Page Numbering in R Bookdown
                            
                                Update/Replace Values in Dataframe with Tidyverse Join
                            
                                R - if_else assign na value

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With