How to get R-squared for robust regression (RLM) in Statsmodels?

Tags:

When it comes to measuring goodness of fit - R-Squared seems to be a commonly understood (and accepted) measure for "simple" linear models. But in case of statsmodels (as well as other statistical software) RLM does not include R-squared together with regression results. Is there a way to get it calculated "manually", perhaps in a way similar to how it is done in Stata?

Or is there another measure that can be used / calculated from the results produced by sm.RLS?

This is what Statsmodels is producing:

import numpy as np
import statsmodels.api as sm

# Sample Data with outliers
nsample = 50
x = np.linspace(0, 20, nsample)
x = sm.add_constant(x)
sig = 0.3
beta = [5, 0.5]
y_true = np.dot(x, beta)
y = y_true + sig * 1. * np.random.normal(size=nsample)
y[[39,41,43,45,48]] -= 5   # add some outliers (10% of nsample)

# Regression with Robust Linear Model
res = sm.RLM(y, x).fit()
print(res.summary())

Which outputs:

                    Robust linear Model Regression Results                    
==============================================================================
Dep. Variable:                      y   No. Observations:                   50
Model:                            RLM   Df Residuals:                       48
Method:                          IRLS   Df Model:                            1
Norm:                          HuberT                                         
Scale Est.:                       mad                                         
Cov Type:                          H1                                         
Date:                 Mo, 27 Jul 2015                                         
Time:                        10:00:00                                         
No. Iterations:                    17                                         
==============================================================================
                 coef    std err          z      P>|z|      [95.0% Conf. Int.]
------------------------------------------------------------------------------
const          5.0254      0.091     55.017      0.000         4.846     5.204
x1             0.4845      0.008     61.555      0.000         0.469     0.500
==============================================================================

765

asked Jul 27 '15 14:07

Primer

2 Answers

Since an OLS return the R2, I would suggest regressing the actual values against the fitted values using simple linear regression. Regardless where the fitted values come from, such an approach would provide you an indication of the corresponding R2.

184

answered Sep 24 '22 01:09

majeed simaan

R2 is not a good measure of goodness of fit for RLM models. The problem is that the outliers have a huge effect on the R2 value, to the point where it is completely determined by outliers. Using weighted regression afterwards is an attractive alternative, but it is better to look at the p-values, standard errors and confidence intervals of the estimated coefficients.

Comparing the OLS summary to RLM (results are slightly different to yours due to different randomization):

                            OLS Regression Results                            
==============================================================================
Dep. Variable:                      y   R-squared:                       0.726
Model:                            OLS   Adj. R-squared:                  0.721
Method:                 Least Squares   F-statistic:                     127.4
Date:                Wed, 03 Nov 2021   Prob (F-statistic):           4.15e-15
Time:                        09:33:40   Log-Likelihood:                -87.455
No. Observations:                  50   AIC:                             178.9
Df Residuals:                      48   BIC:                             182.7
Df Model:                           1                                         
Covariance Type:            nonrobust                                         
==============================================================================
                 coef    std err          t      P>|t|      [0.025      0.975]
------------------------------------------------------------------------------
const          5.7071      0.396     14.425      0.000       4.912       6.503
x1             0.3848      0.034     11.288      0.000       0.316       0.453
==============================================================================
Omnibus:                       23.499   Durbin-Watson:                   2.752
Prob(Omnibus):                  0.000   Jarque-Bera (JB):               33.906
Skew:                          -1.649   Prob(JB):                     4.34e-08
Kurtosis:                       5.324   Cond. No.                         23.0
==============================================================================

Notes:
[1] Standard Errors assume that the covariance matrix of the errors is correctly specified.

                    Robust linear Model Regression Results                    
==============================================================================
Dep. Variable:                      y   No. Observations:                   50
Model:                            RLM   Df Residuals:                       48
Method:                          IRLS   Df Model:                            1
Norm:                          HuberT                                         
Scale Est.:                       mad                                         
Cov Type:                          H1                                         
Date:                Wed, 03 Nov 2021                                         
Time:                        09:34:24                                         
No. Iterations:                    17                                         
==============================================================================
                 coef    std err          z      P>|z|      [0.025      0.975]
------------------------------------------------------------------------------
const          5.1857      0.111     46.590      0.000       4.968       5.404
x1             0.4790      0.010     49.947      0.000       0.460       0.498
==============================================================================

If the model instance has been used for another fit with different fit parameters, then the fit options might not be the correct ones anymore .

You can see that the standard errors and size of the confidence interval decreases in going from OLS to RLM for both the intercept and the slope term. This suggests that the estimates are closer to the real values.

answered Sep 24 '22 01:09

Rob

Related questions
                            
                                Python warnings- how to not print the source line? [duplicate]
                            
                                Prevent PyCharm from showing builtin modules on KeyboardInterrupt and other occasions
                            
                                Low InnoDB Writes per Second - AWS EC2 to MySQL RDS using Python
                            
                                How to distribute files in a Python sdist that are not VCS tracked?
                            
                                Is it possible to prioritise a lock?
                            
                                Unpredictable pandas slice assignment behavior with no SettingWithCopyWarning
                            
                                Executable made with pyInstaller/UPX experiences QtCore4.dll error
                            
                                How to denote return type tuple in Google-style Pydoc for Pycharm?
                            
                                Xgboost: what is the difference among bst.best_score, bst.best_iteration and bst.best_ntree_limit?
                            
                                How to return selenium browser (or how to import a def that return selenium browser)
                            
                                How can I speed up this Keras Attention computation?
                            
                                Why does TensorFlow always use GPU 0?
                            
                                Is double-checked locking thread-safe in Python?
                            
                                what does pip install actually do?
                            
                                Is there a python linter that checks types according to type hints?
                            
                                ast.literal_eval() support for set literals in Python 2.7?
                            
                                Efficient structure for element wise access to very large sparse matrix (Python/Cython)
                            
                                Javascript array with default values (equivalent of Python's defaultdict)? [duplicate]
                            
                                Gtk3 replace child widget with another widget
                            
                                Why is `pandas.read_csv` not the reciprocal of `pandas.DataFrame.to_csv`?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

How to get R-squared for robust regression (RLM) in Statsmodels?

Tags:

python

linear-regression

statsmodels

regression

Primer

People also ask

2 Answers

majeed simaan

Rob

Recent Activity

Donate For Us