Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Access standardized residuals, cook's values, hatvalues (leverage) etc. easily in Python?

I am looking for influence statistics after fitting a linear regression. In R I can obtain them (e.g.) like this:

hatvalues(fitted_model) #hatvalues (leverage)
cooks.distance(fitted_model) #Cook's D values
rstandard(fitted_model) #standardized residuals
rstudent(fitted_model) #studentized residuals

etc.

How can I obtain the same statistics when using statsmodels in Python after fitting a model like this:

#import statsmodels
import statsmodels.api as sm

#Fit linear model to any dataset
model = sm.OLS(Y,X)
results = model.fit()

#Creating a dataframe that includes the studentized residuals
sm.regression.linear_model.OLSResults.outlier_test(results)

Edit: See answer below...

like image 236
Jaynes01 Avatar asked Sep 19 '17 15:09

Jaynes01


People also ask

How do you find the standardized residual?

The standardized residual is found by dividing the difference of the observed and expected values by the square root of the expected value. The standardized residual can be interpreted as any standard score. The mean of the standardized residual is 0 and the standard deviation is 1.

How do you calculate studentized residuals in Python?

Use the OLSRresults. outlier_test() function to produce a dataset that contains the studentized residual for each observation. Show activity on this post. where X is the matrix of our independent variables.


2 Answers

Although the accepted answer is correct, I found it helpful to separately access the statistics as instance attributes of an influence instance (statsmodels.regression.linear_model.OLSResults.get_influence) after I fit my model. This saved me from having to index the summary_frame as I was only interested in one of the statistics and not all of them. So maybe this helps somebody else:

import statsmodels.api as sm

#Fit linear model to any dataset
model = sm.OLS(Y,X)
results = model.fit()

#create instance of influence
influence = results.get_influence()

#leverage (hat values)
leverage = influence.hat_matrix_diag

#Cook's D values (and p-values) as tuple of arrays
cooks_d = influence.cooks_distance

#standardized residuals
standardized_residuals = influence.resid_studentized_internal

#studentized residuals
studentized_residuals = influence.resid_studentized_external
like image 86
Scott McAllister Avatar answered Sep 22 '22 09:09

Scott McAllister


I found it here:

http://www.statsmodels.org/dev/generated/statsmodels.stats.outliers_influence.OLSInfluence.summary_frame.html

OLSInfluence.summary_frame()
like image 33
Jaynes01 Avatar answered Sep 21 '22 09:09

Jaynes01