Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Polynomial data and R's glm()

Tags:

r

glm

How can you get R's glm() to match polynomial data? I've tried several iterations of 'family=AAA(link="BBB")' but I can't seem to get trivial predictions to match.

For example, please help with R's glm to match polynomial data

x=seq(-6,6,2)
y=x*x
parabola=data.frame(x,y)
plot(parabola)
model=glm(y~x,dat=parabola)
test=data.frame(x=seq(-5,5,2))
test$y=predict(model,test)
plot(test)

The plot(parabola) looks as expected, but I can find the incantation of glm() that will make plot(test) look parabolic.

like image 542
Jeff Taylor Avatar asked Feb 04 '26 23:02

Jeff Taylor


1 Answers

I think you need to step back and start to think about a model and how you represent this in R. In your example, y is a quadratic function of x, so you need to include x and x^2 in the model formula, i.e. as predictors you need to estimate the effect of x and x^2 on the response given the data to hand.

If y is Gaussian, conditional upon the model, then you can do this with lm() and either

y ~ x + I(x^2)

or

y ~ poly(x, 2)

In the first, we wrap the quadratic term in I() as the ^ operator has a special meaning (not its mathematical one) in an R model formula. The latter version gives orthogonal polynomials and hence the x and x^2 terms won't be correlated which can help with fitting, however in some cases interpreting the coefficients is trickier with poly().

Putting it all together we have (note that I add some random error to y so as to not predict it perfectly as the example I use is more common in reality):

x <- seq(-6 ,6 ,2)
y <- x^2 + rnorm(length(x), sd = 2)
parabola <- data.frame(x = x, y = y)

mod <- lm(y ~ poly(x, 2), data = parabola)

plot(parabola)
lines(fitted(mod) ~ x, data = parabola, col = "red")

The plot produced is:

enter image description here

An additional issue is whether y is Gaussian? If y can't be negative (i.e. a count), and/or is discrete, modelling using lm() is going to be wrong. That's where glm() might come in, by which you might fit a curve without needing x^2 (although if the data really are a parabola, then x on its own isn't going to fit the response), as there is an explicit transformation of the data from the linear predictor on to the scale of the response.

It is better to think about the properties of the data and the sort of model you want to fit and then build up the degree of polynomial within that modelling framework, rather than jumping in a trying various incantations to simply curve fit the data.

like image 68
Gavin Simpson Avatar answered Feb 08 '26 05:02

Gavin Simpson