Suppose I have x
values, y
values, and expected y values f
(from some nonlinear best fit curve).
How can I compute R^2 in R? Note that this function is not a linear model, but a nonlinear least squares (nls
) fit, so not an lm
fit.
It is long known within the mathematical literature that the coefficient of determination R2 is an inadequate measure for the goodness of fit in nonlinear models. Nevertheless, it is still frequently used within pharmacological and biochemical literature for the analysis and interpretation of nonlinear fitting to data.
Nonlinear regression is an extremely flexible analysis that can fit most any curve that is present in your data. R-squared seems like a very intuitive way to assess the goodness-of-fit for a regression model. Unfortunately, the two just don't go together.
Minitab doesn't calculate R-squared for nonlinear models because the research literature shows that it is an invalid goodness-of-fit statistic for this type of model. There are bad consequences if you use it in this context.
The value R2quantifies goodness of fit. It is a fraction between 0.0 and 1.0, and has no units. Higher values indicate that the model fits the data better.
You just use the lm
function to fit a linear model:
x = runif(100)
y = runif(100)
spam = summary(lm(x~y))
> spam$r.squared
[1] 0.0008532386
Note that the r squared is not defined for non-linear models, or at least very tricky, quote from R-help:
There is a good reason that an nls model fit in R does not provide r-squared - r-squared doesn't make sense for a general nls model.
One way of thinking of r-squared is as a comparison of the residual sum of squares for the fitted model to the residual sum of squares for a trivial model that consists of a constant only. You cannot guarantee that this is a comparison of nested models when dealing with an nls model. If the models aren't nested this comparison is not terribly meaningful.
So the answer is that you probably don't want to do this in the first place.
If you want peer-reviewed evidence, see this article for example; it's not that you can't compute the R^2 value, it's just that it may not mean the same thing/have the same desirable properties as in the linear-model case.
Sounds like f are your predicted values. So the distance from them to the actual values devided by n * variance of y
so something like
1-sum((y-f)^2)/(length(y)*var(y))
should give you a quasi rsquared value, so long as your model is reasonably close to a linear model and n is pretty big.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With