What does the R function `poly` really do?

Tags:

r

I have read through the manual page ?poly (which I admit I did not completely comphrehend) and also read the description of the function in book Introduction to Statistical Learning.

My current understanding is that a call to poly(horsepower, 2) should be equivalent to writing horsepower + I(horsepower^2). However, this seems to be contradicted by the output of the following code:

library(ISLR)  summary(lm(mpg~poly(horsepower,2), data=Auto))$coef      #                       Estimate Std. Error   t value      Pr(>|t|)     #(Intercept)            23.44592  0.2209163 106.13030 2.752212e-289     #poly(horsepower, 2)1 -120.13774  4.3739206 -27.46683  4.169400e-93     #poly(horsepower, 2)2   44.08953  4.3739206  10.08009  2.196340e-21  summary(lm(mpg~horsepower+I(horsepower^2), data=Auto))$coef      #                    Estimate   Std. Error   t value      Pr(>|t|)     #(Intercept)     56.900099702 1.8004268063  31.60367 1.740911e-109     #horsepower      -0.466189630 0.0311246171 -14.97816  2.289429e-40     #I(horsepower^2)  0.001230536 0.0001220759  10.08009  2.196340e-21

My question is, why don't the outputs match, and what is poly really doing?

353

asked Oct 20 '13 23:10

merlin2011

2 Answers

Raw polynomials

To get ordinary polynomials as in the question use raw = TRUE. Unfortunately there is an undesirable aspect with ordinary polynomials in regression. If we fit a quadratic, say, and then a cubic the lower order coefficients of the cubic are all different than for the quadratic, i.e. 56.900099702, -0.466189630, 0.001230536 for the quadratic vs. 6.068478e+01, -5.688501e-01, 2.079011e-03 after refitting with a cubic below.

library(ISLR) fm2raw <- lm(mpg ~ poly(horsepower, 2, raw = TRUE), Auto) cbind(coef(fm2raw)) ##                                          [,1] ## (Intercept)                      56.900099702 ## poly(horsepower, 2, raw = TRUE)1 -0.466189630 ## poly(horsepower, 2, raw = TRUE)2  0.001230536  fm3raw <- lm(mpg ~ poly(horsepower, 3, raw = TRUE), Auto) cbind(coef(fm3raw)) ##                                           [,1] ## (Intercept)                       6.068478e+01 ## poly(horsepower, 3, raw = TRUE)1 -5.688501e-01 ## poly(horsepower, 3, raw = TRUE)2  2.079011e-03 ## poly(horsepower, 3, raw = TRUE)3 -2.146626e-06

Orthogonal polynonials

What we would really like is to add the cubic term in such a way that the lower order coefficients that were fit using the quadratic stay the same after refitting with a cubic fit. To do this take linear combinations of the columns of poly(horsepower, 2, raw = TRUE) and do the same with poly(horsepower, 3, raw = TRUE) such that the columns in the quadratic fit are orthogonal to each other and similarly for the cubic fit. That is sufficient to guarantee that the lower order coefficients won't change when we add higher order coefficients. Note how the first three coefficients are now the same in the two sets below (whereas above they differ). That is, in both cases below the 3 lower order coefficients are 23.44592, -120.13774 and 44.08953 .

fm2 <- lm(mpg ~ poly(horsepower, 2), Auto) cbind(coef(fm2)) ##                            [,1] ## (Intercept)            23.44592 ## poly(horsepower, 2)1 -120.13774 ## poly(horsepower, 2)2   44.08953  fm3 <- lm(mpg ~ poly(horsepower, 3), Auto) cbind(coef(fm3)) ##                             [,1] ## (Intercept)            23.445918 ## poly(horsepower, 3)1 -120.137744 ## poly(horsepower, 3)2   44.089528 ## poly(horsepower, 3)3   -3.948849

Same predictions

Importantly, since the columns of poly(horsepwer, 2) are just linear combinations of the columnns of poly(horsepower, 2, raw = TRUE) the two quadratic models (orthogonal and raw) represent the same models (i.e. they give the same predictions) and only differ in parameterization. For example, the fitted values are the same:

all.equal(fitted(fm2), fitted(fm2raw)) ## [1] TRUE

This would also be true of the raw and orthogonal cubic models.

Orthogonality

We can verify that the polynomials do have orthogonal columns which are also orthogonal to the intercept:

nr <- nrow(Auto) e <- rep(1, nr) / sqrt(nr) # constant vector of unit length p <- cbind(e, poly(Auto$horsepower, 2)) zapsmall(crossprod(p)) ##   e 1 2 ## e 1 0 0 ## 1 0 1 0 ## 2 0 0 1

Residual sum of squares

Another nice property of orthogonal polynomials is that due to the fact that poly produces a matrix whose columns have unit length and are mutually orthogonal (and also orthogonal to the intercept column) the reduction in residual sum of squares due to the adding the cubic term is simply the square of the length of the projection of the response vector on the cubic column of the model matrix.

# these three give the same result  # 1. squared length of projection of y, i.e. Auto$mpg, on cubic term column crossprod(model.matrix(fm3)[, 4], Auto$mpg)^2 ##         [,1] ## [1,] 15.5934  # 2. difference in sums of squares deviance(fm2) - deviance(fm3)  ## [1] 15.5934  # 3. difference in sums of squares from anova anova(fm2, fm3)  ##  ## Analysis of Variance Table ##  ## Model 1: mpg ~ poly(horsepower, 2) ## Model 2: mpg ~ poly(horsepower, 3) ##   Res.Df    RSS Df Sum of Sq      F Pr(>F) ## 1    389 7442.0                            ## 2    388 7426.4  1    15.593 0.8147 0.3673  <-- note Sum of Sq value

answered Nov 07 '22 09:11

G. Grothendieck

When introducing polynomial terms in a statistical model the usual motivation is to determine whether the response is "curved" and whether the curvature is "significant" when that term is added in. The upshot of throwing in an +I(x^2) terms is that minor deviations may get "magnified" by the fitting process depending on their location, and misinterpreted as due to the curvature term when they were just fluctuations at one end or other of the range of data. This results in inappropriate assignment of declarations of "significance".

If you just throw in a squared term with I(x^2), of necessity it's also going to be highly correlated with x at least in the domain where x > 0. Using instead: poly(x,2) creates a "curved" set of variables where the linear term is not so highly correlated with x, and where the curvature is roughly the same across the range of data. (If you want to read up on the statistical theory, search on "orthogonal polynomials".) Just type poly(1:10, 2) and look at the two columns.

poly(1:10, 2)                 1           2  [1,] -0.49543369  0.52223297  [2,] -0.38533732  0.17407766  [3,] -0.27524094 -0.08703883  [4,] -0.16514456 -0.26111648  [5,] -0.05504819 -0.34815531  [6,]  0.05504819 -0.34815531  [7,]  0.16514456 -0.26111648  [8,]  0.27524094 -0.08703883  [9,]  0.38533732  0.17407766 [10,]  0.49543369  0.52223297 attr(,"degree") [1] 1 2 attr(,"coefs") attr(,"coefs")$alpha [1] 5.5 5.5  attr(,"coefs")$norm2 [1]   1.0  10.0  82.5 528.0  attr(,"class") [1] "poly"   "matrix"

The "quadratic" term is centered on 5.5 and the linear term has been shifted down so it is 0 at the same x-point (with the implicit (Intercept) term in the model being depended upon for shifting everything back at the time predictions are requested.)

answered Nov 07 '22 09:11

IRTFM

Related questions
                            
                                Cannot log-in to rstudio-server
                            
                                How to properly document S4 methods using roxygen2
                            
                                Difference between read.csv() and read.csv2() in R
                            
                                What does the function invisible() do?
                            
                                import dat file into R
                            
                                Why TRUE == "TRUE" is TRUE in R?
                            
                                ggplot legends - change labels, order and title
                            
                                Test for numeric elements in a character string
                            
                                R shiny: display "loading..." message while function is running
                            
                                R strsplit with multiple unordered split arguments?
                            
                                Compare if two dataframe objects in R are equal?
                            
                                Essential skills of a Data Scientist [closed]
                            
                                Rescaling the y axis in bar plot causes bars to disappear : R ggplot2
                            
                                Forcing R output to be scientific notation with at most two decimals
                            
                                Sort a data.table fast by Ascending/Descending order
                            
                                backtransform `scale()` for plotting
                            
                                What techniques exists in R to visualize a "distance matrix"?
                            
                                Initializing data.frames() [duplicate]
                            
                                How to add variable key/value pair to list object?
                            
                                Replacing values from a column using a condition in R

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With