How can you get R's glm() to match polynomial data? I've tried several iterations of 'family=AAA(link="BBB")' but I can't seem to get trivial predictions to match.
For example, please help with R's glm to match polynomial data
x=seq(-6,6,2)
y=x*x
parabola=data.frame(x,y)
plot(parabola)
model=glm(y~x,dat=parabola)
test=data.frame(x=seq(-5,5,2))
test$y=predict(model,test)
plot(test)
The plot(parabola) looks as expected, but I can find the incantation of glm() that will make plot(test) look parabolic.
I think you need to step back and start to think about a model and how you represent this in R. In your example, y is a quadratic function of x, so you need to include x and x^2 in the model formula, i.e. as predictors you need to estimate the effect of x and x^2 on the response given the data to hand.
If y is Gaussian, conditional upon the model, then you can do this with lm() and either
y ~ x + I(x^2)
or
y ~ poly(x, 2)
In the first, we wrap the quadratic term in I() as the ^ operator has a special meaning (not its mathematical one) in an R model formula. The latter version gives orthogonal polynomials and hence the x and x^2 terms won't be correlated which can help with fitting, however in some cases interpreting the coefficients is trickier with poly().
Putting it all together we have (note that I add some random error to y so as to not predict it perfectly as the example I use is more common in reality):
x <- seq(-6 ,6 ,2)
y <- x^2 + rnorm(length(x), sd = 2)
parabola <- data.frame(x = x, y = y)
mod <- lm(y ~ poly(x, 2), data = parabola)
plot(parabola)
lines(fitted(mod) ~ x, data = parabola, col = "red")
The plot produced is:

An additional issue is whether y is Gaussian? If y can't be negative (i.e. a count), and/or is discrete, modelling using lm() is going to be wrong. That's where glm() might come in, by which you might fit a curve without needing x^2 (although if the data really are a parabola, then x on its own isn't going to fit the response), as there is an explicit transformation of the data from the linear predictor on to the scale of the response.
It is better to think about the properties of the data and the sort of model you want to fit and then build up the degree of polynomial within that modelling framework, rather than jumping in a trying various incantations to simply curve fit the data.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With