poly() in lm(): difference between raw vs. orthogonal

Tags:

r

I have

library(ISLR)
attach(Wage)

# Polynomial Regression and Step Functions

fit=lm(wage~poly(age,4),data=Wage)
coef(summary(fit))

fit2=lm(wage~poly(age,4,raw=T),data=Wage)
coef(summary(fit2))

plot(age, wage)
lines(20:350, predict(fit, newdata = data.frame(age=20:350)), lwd=3, col="darkred")
lines(20:350, predict(fit2, newdata = data.frame(age=20:350)), lwd=3, col="darkred")

The prediction lines seem to be the same, however why are the coefficients so different? How do you intepret them in raw=T and raw=F.

I see that the coefficients produced with poly(...,raw=T) match the ones with ~age+I(age^2)+I(age^3)+I(age^4).

If I want to use the coefficients to get the prediction "manually" (without using the predict() function) is there something I should pay attention to? How should I interpret the coefficients of the orthogonal polynomials in poly().

472

asked May 02 '15 08:05

ECII

2 Answers

By default, with raw = FALSE, poly() computes an orthogonal polynomial. It internally sets up the model matrix with the raw coding x, x^2, x^3, ... first and then scales the columns so that each column is orthogonal to the previous ones. This does not change the fitted values but has the advantage that you can see whether a certain order in the polynomial significantly improves the regression over the lower orders.

Consider the simple cars data with response stopping distance and driving speed. Physically, this should have a quadratic relationship but in this (old) dataset the quadratic term is not significant:

m1 <- lm(dist ~ poly(speed, 2), data = cars)
m2 <- lm(dist ~ poly(speed, 2, raw = TRUE), data = cars)

In the orthogonal coding you get the following coefficients in summary(m1):

                Estimate Std. Error t value Pr(>|t|)    
(Intercept)       42.980      2.146  20.026  < 2e-16 ***
poly(speed, 2)1  145.552     15.176   9.591 1.21e-12 ***
poly(speed, 2)2   22.996     15.176   1.515    0.136

This shows that there is a highly significant linear effect while the second order is not significant. The latter p-value (i.e., the one of the highest order in the polynomial) is the same as in the raw coding:

                            Estimate Std. Error t value Pr(>|t|)
(Intercept)                  2.47014   14.81716   0.167    0.868
poly(speed, 2, raw = TRUE)1  0.91329    2.03422   0.449    0.656
poly(speed, 2, raw = TRUE)2  0.09996    0.06597   1.515    0.136

but the lower order p-values change dramatically. The reason is that in model m1 the regressors are orthogonal while they are highly correlated in m2:

cor(model.matrix(m1)[, 2], model.matrix(m1)[, 3])
## [1] 4.686464e-17
cor(model.matrix(m2)[, 2], model.matrix(m2)[, 3])
## [1] 0.9794765

Thus, in the raw coding you can only interpret the p-value of speed if speed^2 remains in the model. And as both regressors are highly correlated one of them can be dropped. However, in the orthogonal coding speed^2 only captures the quadratic part that has not been captured by the linear term. And then it becomes clear that the linear part is significant while the quadratic part has no additional significance.

112

answered Oct 17 '22 03:10

Achim Zeileis

I believe the way the polynomial regression would be run based on raw=T, is that one would look at the highest power term and assess its significance based on the pvalue for that coefficient.

If found not significant (large pvalue) then the regression would be re-run without that particular non-significant power (ie. the next lower degree) and this would be carried out one step at a time reducing if not significant.

If at any time the higher degree is significant then the process would stop and assert that, that degree is the appropriate one.

answered Oct 17 '22 02:10

Joe F

Related questions
                            
                                Changing x axis tick labels in R using ggplot2 [duplicate]
                            
                                How to fill a matrix with zero (0)
                            
                                using R to copy files
                            
                                Elegantly assigning multiple columns in data.table with lapply()
                            
                                short formula call for many variables when building a model [duplicate]
                            
                                Remove strip background keep panel border
                            
                                Sum of rows based on column value
                            
                                Plot coordinates on map
                            
                                Add author affiliation in R markdown beamer presentation
                            
                                How to generate permutations or combinations of object in R?
                            
                                Useful little functions in R? [closed]
                            
                                How to generate a frequency table in R with with cumulative frequency and relative frequency
                            
                                Boxplot in R showing the mean
                            
                                data.table row-wise sum, mean, min, max like dplyr?
                            
                                How do I draw gridlines using abline() that are behind the data?
                            
                                R: Error in fBody[[i]] : no such index at level 4
                            
                                Recursively repeat vector elements N times each [duplicate]
                            
                                Regular expressions (RegEx) and dplyr::filter()
                            
                                Raw text strings for file paths in R
                            
                                How to add percentage or count labels above percentage bar plot?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With