I have read through the manual page ?poly
(which I admit I did not completely comphrehend) and also read the description of the function in book Introduction to Statistical Learning.
My current understanding is that a call to poly(horsepower, 2)
should be equivalent to writing horsepower + I(horsepower^2)
. However, this seems to be contradicted by the output of the following code:
library(ISLR) summary(lm(mpg~poly(horsepower,2), data=Auto))$coef # Estimate Std. Error t value Pr(>|t|) #(Intercept) 23.44592 0.2209163 106.13030 2.752212e-289 #poly(horsepower, 2)1 -120.13774 4.3739206 -27.46683 4.169400e-93 #poly(horsepower, 2)2 44.08953 4.3739206 10.08009 2.196340e-21 summary(lm(mpg~horsepower+I(horsepower^2), data=Auto))$coef # Estimate Std. Error t value Pr(>|t|) #(Intercept) 56.900099702 1.8004268063 31.60367 1.740911e-109 #horsepower -0.466189630 0.0311246171 -14.97816 2.289429e-40 #I(horsepower^2) 0.001230536 0.0001220759 10.08009 2.196340e-21
My question is, why don't the outputs match, and what is poly
really doing?
1. Did you read this:inside-r.org/r-doc/stats/poly ? raw means whether using orthogonal polynomials.
Polynomial regression is used when there is no linear correlation between the variables. Hence, it explains why it looks more like a non-linear function.
Orthogonal Polynomials. 1 Introduction. Mathematically orthogonal means perpendicular, that is, at right angles. For example, the set of vectors. fx y zg in three-dimensional sppace are orthogonal.
To get ordinary polynomials as in the question use raw = TRUE
. Unfortunately there is an undesirable aspect with ordinary polynomials in regression. If we fit a quadratic, say, and then a cubic the lower order coefficients of the cubic are all different than for the quadratic, i.e. 56.900099702, -0.466189630, 0.001230536 for the quadratic vs. 6.068478e+01, -5.688501e-01, 2.079011e-03 after refitting with a cubic below.
library(ISLR) fm2raw <- lm(mpg ~ poly(horsepower, 2, raw = TRUE), Auto) cbind(coef(fm2raw)) ## [,1] ## (Intercept) 56.900099702 ## poly(horsepower, 2, raw = TRUE)1 -0.466189630 ## poly(horsepower, 2, raw = TRUE)2 0.001230536 fm3raw <- lm(mpg ~ poly(horsepower, 3, raw = TRUE), Auto) cbind(coef(fm3raw)) ## [,1] ## (Intercept) 6.068478e+01 ## poly(horsepower, 3, raw = TRUE)1 -5.688501e-01 ## poly(horsepower, 3, raw = TRUE)2 2.079011e-03 ## poly(horsepower, 3, raw = TRUE)3 -2.146626e-06
What we would really like is to add the cubic term in such a way that the lower order coefficients that were fit using the quadratic stay the same after refitting with a cubic fit. To do this take linear combinations of the columns of poly(horsepower, 2, raw = TRUE)
and do the same with poly(horsepower, 3, raw = TRUE)
such that the columns in the quadratic fit are orthogonal to each other and similarly for the cubic fit. That is sufficient to guarantee that the lower order coefficients won't change when we add higher order coefficients. Note how the first three coefficients are now the same in the two sets below (whereas above they differ). That is, in both cases below the 3 lower order coefficients are 23.44592, -120.13774 and 44.08953 .
fm2 <- lm(mpg ~ poly(horsepower, 2), Auto) cbind(coef(fm2)) ## [,1] ## (Intercept) 23.44592 ## poly(horsepower, 2)1 -120.13774 ## poly(horsepower, 2)2 44.08953 fm3 <- lm(mpg ~ poly(horsepower, 3), Auto) cbind(coef(fm3)) ## [,1] ## (Intercept) 23.445918 ## poly(horsepower, 3)1 -120.137744 ## poly(horsepower, 3)2 44.089528 ## poly(horsepower, 3)3 -3.948849
Importantly, since the columns of poly(horsepwer, 2)
are just linear combinations of the columnns of poly(horsepower, 2, raw = TRUE)
the two quadratic models (orthogonal and raw) represent the same models (i.e. they give the same predictions) and only differ in parameterization. For example, the fitted values are the same:
all.equal(fitted(fm2), fitted(fm2raw)) ## [1] TRUE
This would also be true of the raw and orthogonal cubic models.
We can verify that the polynomials do have orthogonal columns which are also orthogonal to the intercept:
nr <- nrow(Auto) e <- rep(1, nr) / sqrt(nr) # constant vector of unit length p <- cbind(e, poly(Auto$horsepower, 2)) zapsmall(crossprod(p)) ## e 1 2 ## e 1 0 0 ## 1 0 1 0 ## 2 0 0 1
Another nice property of orthogonal polynomials is that due to the fact that poly
produces a matrix whose columns have unit length and are mutually orthogonal (and also orthogonal to the intercept column) the reduction in residual sum of squares due to the adding the cubic term is simply the square of the length of the projection of the response vector on the cubic column of the model matrix.
# these three give the same result # 1. squared length of projection of y, i.e. Auto$mpg, on cubic term column crossprod(model.matrix(fm3)[, 4], Auto$mpg)^2 ## [,1] ## [1,] 15.5934 # 2. difference in sums of squares deviance(fm2) - deviance(fm3) ## [1] 15.5934 # 3. difference in sums of squares from anova anova(fm2, fm3) ## ## Analysis of Variance Table ## ## Model 1: mpg ~ poly(horsepower, 2) ## Model 2: mpg ~ poly(horsepower, 3) ## Res.Df RSS Df Sum of Sq F Pr(>F) ## 1 389 7442.0 ## 2 388 7426.4 1 15.593 0.8147 0.3673 <-- note Sum of Sq value
When introducing polynomial terms in a statistical model the usual motivation is to determine whether the response is "curved" and whether the curvature is "significant" when that term is added in. The upshot of throwing in an +I(x^2)
terms is that minor deviations may get "magnified" by the fitting process depending on their location, and misinterpreted as due to the curvature term when they were just fluctuations at one end or other of the range of data. This results in inappropriate assignment of declarations of "significance".
If you just throw in a squared term with I(x^2)
, of necessity it's also going to be highly correlated with x at least in the domain where x > 0
. Using instead: poly(x,2)
creates a "curved" set of variables where the linear term is not so highly correlated with x, and where the curvature is roughly the same across the range of data. (If you want to read up on the statistical theory, search on "orthogonal polynomials".) Just type poly(1:10, 2)
and look at the two columns.
poly(1:10, 2) 1 2 [1,] -0.49543369 0.52223297 [2,] -0.38533732 0.17407766 [3,] -0.27524094 -0.08703883 [4,] -0.16514456 -0.26111648 [5,] -0.05504819 -0.34815531 [6,] 0.05504819 -0.34815531 [7,] 0.16514456 -0.26111648 [8,] 0.27524094 -0.08703883 [9,] 0.38533732 0.17407766 [10,] 0.49543369 0.52223297 attr(,"degree") [1] 1 2 attr(,"coefs") attr(,"coefs")$alpha [1] 5.5 5.5 attr(,"coefs")$norm2 [1] 1.0 10.0 82.5 528.0 attr(,"class") [1] "poly" "matrix"
The "quadratic" term is centered on 5.5 and the linear term has been shifted down so it is 0 at the same x-point (with the implicit (Intercept)
term in the model being depended upon for shifting everything back at the time predictions are requested.)
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With