Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to retrieve model estimates from statsmodels?

From a dataset like this:

import pandas as pd
import numpy as np
import statsmodels.api as sm

# A dataframe with two variables
np.random.seed(123)
rows = 12
rng = pd.date_range('1/1/2017', periods=rows, freq='D')
df = pd.DataFrame(np.random.randint(100,150,size=(rows, 2)), columns=['y', 'x']) 
df = df.set_index(rng)

enter image description here

...and a linear regression model like this:

x = sm.add_constant(df['x'])
model = sm.OLS(df['y'], x).fit()

... you can easily retrieve some model coefficients this way:

print(model.params)

enter image description here

But I just can't find out how to retrieve all other parameters from the model summary:

print(str(model.summary()))

enter image description here

As stated in the question, I'm particularly interested in R-squared.

From the post How to extract a particular value from the OLS-summary in Pandas? I learned that you could just use print(model.r2) to do the same thing there. But that does not seem to work for statsmodels.

Any suggestions?

like image 517
vestland Avatar asked Jan 30 '18 13:01

vestland


People also ask

How do you find the model coefficient in Python?

You can use the params property of a fitted model to get the coefficients. will print you a numpy array [ 0.89516052 2.00334187] - estimates of intercept and slope respectively. If you want more information, you can use the object result. summary() that contains 3 detailed tables with model description.

How do you get adjusted R squared in statsmodels?

Adjusted R-squared. This is defined here as 1 - ( nobs -1)/ df_resid * (1- rsquared ) if a constant is included and 1 - nobs / df_resid * (1- rsquared ) if no constant is included.

What is the difference between statsmodels and Sklearn linear regression?

A key difference between the two libraries is how they handle constants. Scikit-learn allows the user to specify whether or not to add a constant through a parameter, while statsmodels' OLS class has a function that adds a constant to a given array.


1 Answers

You can get R-squared like:

Code:

model.rsquared

Test Code:

import pandas as pd
import numpy as np
import statsmodels.api as sm

# A dataframe with two variables
np.random.seed(123)
rows = 12
rng = pd.date_range('1/1/2017', periods=rows, freq='D')
df = pd.DataFrame(np.random.randint(100,150,size=(rows, 2)), columns=['y', 'x'])
df = df.set_index(rng)

x = sm.add_constant(df['x'])
model = sm.OLS(df['y'], x).fit()

print(model.params)
print(model.rsquared)
print(str(model.summary()))

Results:

const    176.636417
x         -0.357185
dtype: float64

0.338332793094

                            OLS Regression Results                            
==============================================================================
Dep. Variable:                      y   R-squared:                       0.338
Model:                            OLS   Adj. R-squared:                  0.272
Method:                 Least Squares   F-statistic:                     5.113
Date:                Tue, 30 Jan 2018   Prob (F-statistic):             0.0473
Time:                        05:36:04   Log-Likelihood:                -41.442
No. Observations:                  12   AIC:                             86.88
Df Residuals:                      10   BIC:                             87.85
Df Model:                           1                                         
Covariance Type:            nonrobust                                         
==============================================================================
                 coef    std err          t      P>|t|      [0.025      0.975]
------------------------------------------------------------------------------
const        176.6364     20.546      8.597      0.000     130.858     222.415
x             -0.3572      0.158     -2.261      0.047      -0.709      -0.005
==============================================================================
Omnibus:                        1.934   Durbin-Watson:                   1.182
Prob(Omnibus):                  0.380   Jarque-Bera (JB):                1.010
Skew:                          -0.331   Prob(JB):                        0.603
Kurtosis:                       1.742   Cond. No.                     1.10e+03
==============================================================================

Warnings:
[1] Standard Errors assume that the covariance matrix of the errors is correctly specified.
[2] The condition number is large, 1.1e+03. This might indicate that there are
strong multicollinearity or other numerical problems.

Finding All Attribute Names:

With a small bit of code:

for attr in dir(model):
    if not attr.startswith('_'):
        print(attr)

You can see all of the attributes on an object:

HC0_se
HC1_se
HC2_se
HC3_se
aic
bic
bse
centered_tss
compare_f_test
compare_lm_test
compare_lr_test
condition_number
conf_int
conf_int_el
cov_HC0
cov_HC1
cov_HC2
cov_HC3
cov_kwds
cov_params
cov_type
df_model
df_resid
eigenvals
el_test
ess
f_pvalue
f_test
fittedvalues
fvalue
get_influence
get_prediction
get_robustcov_results
initialize
k_constant
llf
load
model
mse_model
mse_resid
mse_total
nobs
normalized_cov_params
outlier_test
params
predict
pvalues
remove_data
resid
resid_pearson
rsquared
rsquared_adj
save
scale
ssr
summary
summary2
t_test
tvalues
uncentered_tss
use_t
wald_test
wald_test_terms
wresid
like image 89
Stephen Rauch Avatar answered Oct 21 '22 19:10

Stephen Rauch