This may be a stupid question, but I didn't find an answer to it anywhere in lmfit's documentation. My question is simple: how do I retrieve R squared? (I know I can calculate it manually with 1 - SS_res / SS_tot
)
Update:
I tried calculating R squared myself and compared it to the R squared from statsmodels
.
Parameters are the same in both estimations, but R squared is not.
Code:
from lmfit import minimize, Parameters
import numpy as np
import statsmodels.api as sm
import random
x = np.linspace(0, 15, 10)
x_ols = sm.add_constant(x)
y = [random.randint(0,15) for r in xrange(10)]
model = sm.OLS(y,x_ols)
results = model.fit()
print "OLS: ", format(results.params[0], '.5f'), format(results.params[1], '.5f'), "R^2: ", results.rsquared
# define objective function: returns the array to be minimized
def fcn2min(params, x, data):
a = params['a'].value
b = params['b'].value
model = a + b * x
return model - data
for i in range(0,1):
# create a set of Parameters
params = Parameters()
params.add('a', value= i)
params.add('b', value= 20)
# do fit, here with leastsq model
result = minimize(fcn2min, params, args=(x, y))
yhat = params['a'].value + params['b'].value * x
ybar = np.sum(y)/len(y)
ssreg = np.sum((yhat-ybar)**2) # or sum([ (yihat - ybar)**2 for yihat in yhat])
sstot = np.sum((y - ybar)**2) # or sum([ (yi - ybar)**2 for yi in y])
r2 = ssreg / sstot
print "lmfit: ", format(params['a'].value, '.5f'), format(params['b'].value, '.5f'), "R^2: ", r2
I don't see an included rsquared in lmfit, but we can reuse either the residuals or the redchi
I am using a similar example where y contains additional noise
lmfit result (assuming the mean residual equals zero, which is always true for linear regression)
>>> 1 - result.residual.var() / np.var(y)
0.98132815639800652
>>> 1 - result.redchi / np.var(y, ddof=2)
0.9813281563980063
compared to OLS results:
>>> results.rsquared
0.98132815639800663
This is the definition of rsquared when we compare to a model with just an intercept and without weights.
The calculations for rsquared in statsmodels are adjusted for the case when the regression does not include an intercept, and they takes weights into account for weighted least squares.
ok, the reason for that is because I chose random y's, so the fitting was poor. using a different random generator, who producs better fitting, gives an identical R squared. modification is:
y = np.linspace(0, 15, 50) + [random.randint(0,15) for r in xrange(50)]
btw, the adjusted R squared calculation is:
n = len(x)
p = len(params) - 1
r2_adj = r2 - (1-r2) * p / (n-p-1)
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With