I am trying to fit some models to some data and the resulting model predicts sensible values and the plots seem correct. But when extracting the coefficients and plotting the functions separately, they make no sense!. I am obviously doing something wrong, so please can someone tell me where the error is?
Data:
dput(distcur)
structure(list(id1 = c(1.6, 1.6, 1.6, 1.6, 1.6, 1.6, 1.6, 1.6
), range = c(-39.898125, -21.448125, -11.07, -3.22875, 3.776484375,
12.309609375, 22.399453125, 39.235078125), meanrat = c(20.2496,
17.7504273504274, 12.76875, 2.475, -1.4295652173913, -3.9603305785124,
-14.7008547008547, -19.7366666666667)), .Names = c("id1", "range",
"meanrat"), row.names = 9:16, class = "data.frame")
library(ggplot2)
id = 1.6
degree = 3
press_x <- seq(min(distcur$range), max(distcur$range), length = 500)
moddist3b <- lm(meanrat ~ poly(range, degree), distcur)
valsdist = data.frame(predict(moddist3b, data.frame(range = press_x)))
colnames(valsdist) = "pred"
valsdist$id1 = id
allvals = cbind(valsdist, press_x)
summary(moddist3b)
#test plot
pdf(paste("mod-",measure,id ))
TITLE = paste("Distance ID: ", id, "Model = line, Points = exp1")
p = ggplot(allvals, aes(x=press_x, y=pred, colour=factor(id1))) +
geom_line() +
geom_point(data=distcur, aes(shape=factor(id1), x = range, y = meanrat, colour = factor(id1))) +
ylim(-100, 100) +
labs(title=TITLE) +
ylab("Mean Rating (%)") +
xlab(measure)
print(p)
dev.off()
I know the image is really bad quality, but it shows that it is correct. However the coefficients obtained from the model used to build the function look nothing like that plot:
summary(moddist3b)
Call:
lm(formula = meanrat ~ poly(range, degree), data = distcur)
Residuals:
9 10 11 12 13 14 15 16
-0.20134 0.44939 1.65996 -2.80500 -1.14594 2.98617 -0.92081 -0.02244
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 1.6770 0.8281 2.025 0.1128
poly(range, degree)1 -37.7155 2.3423 -16.102 8.7e-05 ***
poly(range, degree)2 -2.9435 2.3423 -1.257 0.2773
poly(range, degree)3 6.4888 2.3423 2.770 0.0503 .
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Residual standard error: 2.342 on 4 degrees of freedom
Multiple R-squared: 0.9853, Adjusted R-squared: 0.9743
F-statistic: 89.51 on 3 and 4 DF, p-value: 0.0004019
Giving function y = 6.49x^3 −2.94x^2 − 37.72x + 1.68
Plotting that on google clearly shows that the function is nothing like the plot from R (from the model)
https://www.google.com/search?q=6.49x^3+%E2%88%922.94x^2+%E2%88%92+37.72x+%2B+1.68&ie=utf-8&oe=utf-8&aq=t&rls=org.mozilla:en-US:unofficial&client=iceweasel-a&channel=fflb#client=iceweasel-a&rls=org.mozilla:en-US%3Aunofficial&channel=fflb&sclient=psy-ab&q=6.49*x^3+-2.94*x^2+-+37.72*x+%2B+1.68&oq=6.49*x^3+-2.94*x^2+-+37.72*x+%2B+1.68&gs_l=serp.3...3610.3975.1.4155.2.2.0.0.0.0.107.147.1j1.2.0...0.0...1c.1.14.psy-ab.4C6De6gdmtg&pbx=1&bav=on.2,or.r_qf.&bvm=bv.47008514,d.d2k&fp=5e81885614cfda4f&biw=1440&bih=667
The problem you are having has nothing to do with ggplot
. Instead, it's how you define your linear model. As an aside, the way I figured out what was going on was to predict at 0:
R> (moddist3b <- lm(meanrat ~ poly(range, 3), distcur) )
Coefficients:
(Intercept) poly(range, 3)1 poly(range, 3)2 poly(range, 3)3
1.68 -37.72 -2.94 6.49
R> predict(moddist3b, data.frame(range = 0))
1
2.733
and note that the prediction was off (it should be 1.68).
Anyway, you need to fit you model using the argument raw=TRUE
(moddist3b <- lm(meanrat ~ poly(range, 3, raw=TRUE), distcur) )
predict(moddist3b, data.frame(range = 0))
This gives you what you expect. By default, poly
works with orthogonal polynomials. See this blog post and the poly help page for further details.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With