I'm getting different values of r^2 (coefficient of determination) when I try OLS fits with these two libraries and I can't quite figure out why. (Some spacing removed for your convenience)
In [1]: import pandas as pd
In [2]: import numpy as np
In [3]: import statsmodels.api as sm
In [4]: import scipy.stats
In [5]: np.random.seed(100)
In [6]: x = np.linspace(0, 10, 100) + 5*np.random.randn(100)
In [7]: y = np.arange(100)
In [8]: slope, intercept, r, p, std_err = scipy.stats.linregress(x, y)
In [9]: r**2
Out[9]: 0.22045988449873671
In [10]: model = sm.OLS(y, x)
In [11]: est = model.fit()
In [12]: est.rsquared
Out[12]: 0.5327910685035413
What is going on here? I can't figure it out! Is there an error somewhere?
This is not an answer to the original question which has been answered.
About R-squared in a regression without a constant.
One problem is that a regression without an intercept doesn't have the standard definition of R^2.
Essentially, R-squared as a goodness of fit measure in a model with an intercept compares the full model with the model that has only an intercept. If the full model does not have an intercept, then the standard definition of R^2 can produce weird results like negative R^2.
The conventional definition in the regression without constant divides by the total sum of squares of the dependent variable instead of the demeaned. The R^2 between a regression with a constant and without cannot really be compared in a meaningful way.
see for example the issue that triggered the change in statsmodels to handle R^2 "correctly" in the no-constant regression: https://github.com/statsmodels/statsmodels/issues/785
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With