Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How does R handle ordinal predictors in lm()?

As I understand it, when you fit a linear model in R using a nominal predictor, R essentially uses dummy 1/0 variables for each level (except the reference level), and then giving a regular old coefficient for each of these variables. What does it do for ordinal predictors? It gives me estimated coefficients for each level, so it's not just treating the ranks as a numerical predictor. How do I properly interpret output like this?

cars.data <- mtcars[,1:3]
cars.data$cyl.ord <- ordered(cars.data$cyl)
lm(mpg ~ disp + cyl.ord, data = cars.data)
Call:  
lm(formula = mpg ~ disp + cyl.ord)  

Coefficients:  
(Intercept)         disp    cyl.ord.L    cyl.ord.Q  
   26.34212     -0.02731     -3.38852      1.95127 

I originally assumed that I could use the coefficients from the ordinal variable the same way I would if it were an unordered categorical, but predict.lm(ord.model, newdata = data.frame("disp" = 150, "cyl.ord" = "6")) gives 20.65263, while 26.34212378 -0.02730864*150 -3.38851642 gives only 18.85731, so that can't be it. (I tried +1.95127 and +0 in place of -3.38852 in case I was just mixed up about what order they were listed in, but no dice.) I couldn't find anything in the documentation for order() or lm(), online, or in any of my books.

How does R deal with ordinal predictors mathematically, and how should I deal with the output?

like image 267
MissMonicaE Avatar asked Jan 30 '17 19:01

MissMonicaE


1 Answers

It appears that transforming a factor variable into an ordinal factor variable changes the default contrast from "contr.treatment" to "contr.poly"

# make cyl a factor
cars.data$cyl <- factor(cars.data$cyl)

Now, consider the outputs of lm with "contr.treatment" and "contr.poly"

lm(mpg ~ disp + cyl.ord, data=cars.data)

Call:
lm(formula = mpg ~ disp + cyl.ord, data = cars.data)

Coefficients:
(Intercept)         disp    cyl.ord.L    cyl.ord.Q  
   26.34212     -0.02731     -3.38852      1.95127  

lm(mpg ~ disp + cyl, data=cars.data,
   contrasts=list(cyl="contr.poly"))

Call:
lm(formula = mpg ~ disp + cyl, data = cars.data,
   contrasts=list(cyl = "contr.poly"))

Coefficients:
(Intercept)         disp        cyl.L        cyl.Q  
   26.34212     -0.02731     -3.38852      1.95127  

Sor the ordered factor uses "contr.poly" as the default contrast and we can get the same results from an unordered factor. Now, consider the unordered factor.

lm(mpg ~ disp + cyl, data=cars.data)

Call:
lm(formula = mpg ~ disp + cyl, data = cars.data)

Coefficients:
(Intercept)         disp         cyl6         cyl8  
   29.53477     -0.02731     -4.78585     -4.79209  

lm(mpg ~ disp + cyl.ord, data=cars.data, contrasts=list(cyl.ord="contr.treatment"))

Call:
lm(formula = mpg ~ disp + cyl.ord, data = cars.data,
   contrasts=list(cyl.ord="contr.treatment"))

Coefficients:
(Intercept)         disp     cyl.ord6     cyl.ord8  
   29.53477     -0.02731     -4.78585     -4.79209

So, an unordered factor variable uses "contr.treatment" by default and we can get the same results from an ordered factor by explicitly asking for it.

But let's take a closer look at the model matrices that are used in the regressions.

# Show model matrix
model.matrix(mpg ~ disp + cyl, data=cars.data)
                    (Intercept)  disp cyl6 cyl8
Mazda RX4                     1 160.0    1    0
Mazda RX4 Wag                 1 160.0    1    0
Datsun 710                    1 108.0    0    0
...
attr(,"assign")
[1] 0 1 2 2
attr(,"contrasts")
attr(,"contrasts")$cyl
[1] "contr.treatment"

Now, use "contr.poly" as contrast

model.matrix(mpg ~ disp + cyl, data=cars.data, contrasts.arg=list(cyl="contr.poly"))
                    (Intercept)  disp         cyl.L      cyl.Q
Mazda RX4                     1 160.0 -9.073800e-17 -0.8164966
Mazda RX4 Wag                 1 160.0 -9.073800e-17 -0.8164966
Datsun 710                    1 108.0 -7.071068e-01  0.4082483
...
attr(,"assign")
[1] 0 1 2 2
attr(,"contrasts")
attr(,"contrasts")$cyl
[1] "contr.poly"

Next, check out cyl.ord in place of cyl

model.matrix(mpg ~ disp + cyl.ord, data=cars.data)
                    (Intercept)  disp     cyl.ord.L  cyl.ord.Q
Mazda RX4                     1 160.0 -9.073800e-17 -0.8164966
Mazda RX4 Wag                 1 160.0 -9.073800e-17 -0.8164966
Datsun 710                    1 108.0 -7.071068e-01  0.4082483
...
attr(,"assign")
[1] 0 1 2 2
attr(,"contrasts")
attr(,"contrasts")$cyl.ord
[1] "contr.poly"

The final two matrices have identical entries, so the use of "contr.poly" appears to explain the initial discrepencies. To learn more about contrasts, you can take a look at ?contrasts.

like image 97
lmo Avatar answered Sep 24 '22 09:09

lmo