When it comes to measuring goodness of fit - R-Squared seems to be a commonly understood (and accepted) measure for "simple" linear models.
But in case of statsmodels
(as well as other statistical software) RLM does not include R-squared together with regression results.
Is there a way to get it calculated "manually", perhaps in a way similar to how it is done in Stata?
Or is there another measure that can be used / calculated from the results produced by sm.RLS
?
This is what Statsmodels is producing:
import numpy as np
import statsmodels.api as sm
# Sample Data with outliers
nsample = 50
x = np.linspace(0, 20, nsample)
x = sm.add_constant(x)
sig = 0.3
beta = [5, 0.5]
y_true = np.dot(x, beta)
y = y_true + sig * 1. * np.random.normal(size=nsample)
y[[39,41,43,45,48]] -= 5 # add some outliers (10% of nsample)
# Regression with Robust Linear Model
res = sm.RLM(y, x).fit()
print(res.summary())
Which outputs:
Robust linear Model Regression Results
==============================================================================
Dep. Variable: y No. Observations: 50
Model: RLM Df Residuals: 48
Method: IRLS Df Model: 1
Norm: HuberT
Scale Est.: mad
Cov Type: H1
Date: Mo, 27 Jul 2015
Time: 10:00:00
No. Iterations: 17
==============================================================================
coef std err z P>|z| [95.0% Conf. Int.]
------------------------------------------------------------------------------
const 5.0254 0.091 55.017 0.000 4.846 5.204
x1 0.4845 0.008 61.555 0.000 0.469 0.500
==============================================================================
Robust regression uses Iteratively Reweighted Least Squares(IRLS) for Maximum Likelihood Estimation(MLE) whereas linear regression uses Ordinary Least Squares(OLS), which is the reason R-squared(coefficient of determination) is returned by lm() and not by rlm().
In section 3 we have introduced an alternative robust version of R2 based on LTS estimators. The influence function of R2 is discussed in section 4. are much more robust than others with both vertical and leverage points outliers.
Robust regression is an alternative to least squares regression when data are contaminated with outliers or influential observations, and it can also be used for the purpose of detecting influential observations.
Since an OLS return the R2, I would suggest regressing the actual values against the fitted values using simple linear regression. Regardless where the fitted values come from, such an approach would provide you an indication of the corresponding R2.
R2 is not a good measure of goodness of fit for RLM models. The problem is that the outliers have a huge effect on the R2 value, to the point where it is completely determined by outliers. Using weighted regression afterwards is an attractive alternative, but it is better to look at the p-values, standard errors and confidence intervals of the estimated coefficients.
Comparing the OLS summary to RLM (results are slightly different to yours due to different randomization):
OLS Regression Results
==============================================================================
Dep. Variable: y R-squared: 0.726
Model: OLS Adj. R-squared: 0.721
Method: Least Squares F-statistic: 127.4
Date: Wed, 03 Nov 2021 Prob (F-statistic): 4.15e-15
Time: 09:33:40 Log-Likelihood: -87.455
No. Observations: 50 AIC: 178.9
Df Residuals: 48 BIC: 182.7
Df Model: 1
Covariance Type: nonrobust
==============================================================================
coef std err t P>|t| [0.025 0.975]
------------------------------------------------------------------------------
const 5.7071 0.396 14.425 0.000 4.912 6.503
x1 0.3848 0.034 11.288 0.000 0.316 0.453
==============================================================================
Omnibus: 23.499 Durbin-Watson: 2.752
Prob(Omnibus): 0.000 Jarque-Bera (JB): 33.906
Skew: -1.649 Prob(JB): 4.34e-08
Kurtosis: 5.324 Cond. No. 23.0
==============================================================================
Notes:
[1] Standard Errors assume that the covariance matrix of the errors is correctly specified.
Robust linear Model Regression Results
==============================================================================
Dep. Variable: y No. Observations: 50
Model: RLM Df Residuals: 48
Method: IRLS Df Model: 1
Norm: HuberT
Scale Est.: mad
Cov Type: H1
Date: Wed, 03 Nov 2021
Time: 09:34:24
No. Iterations: 17
==============================================================================
coef std err z P>|z| [0.025 0.975]
------------------------------------------------------------------------------
const 5.1857 0.111 46.590 0.000 4.968 5.404
x1 0.4790 0.010 49.947 0.000 0.460 0.498
==============================================================================
If the model instance has been used for another fit with different fit parameters, then the fit options might not be the correct ones anymore .
You can see that the standard errors and size of the confidence interval decreases in going from OLS to RLM for both the intercept and the slope term. This suggests that the estimates are closer to the real values.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With