Converting statsmodels summary object to Pandas Dataframe

Tags:

I am doing multiple linear regression with statsmodels.formula.api (ver 0.9.0) on Windows 10. After fitting the model and getting the summary with following lines i get summary in summary object format.

X_opt  = X[:, [0,1,2,3]]
regressor_OLS = sm.OLS(endog= y, exog= X_opt).fit()
regressor_OLS.summary()


                          OLS Regression Results                            
==============================================================================
Dep. Variable:                      y   R-squared:                       0.951
Model:                            OLS   Adj. R-squared:                  0.948
Method:                 Least Squares   F-statistic:                     296.0
Date:                Wed, 08 Aug 2018   Prob (F-statistic):           4.53e-30
Time:                        00:46:48   Log-Likelihood:                -525.39
No. Observations:                  50   AIC:                             1059.
Df Residuals:                      46   BIC:                             1066.
Df Model:                           3                                         
Covariance Type:            nonrobust                                         
==============================================================================
                 coef    std err          t      P>|t|      [0.025      0.975]
------------------------------------------------------------------------------
const       5.012e+04   6572.353      7.626      0.000    3.69e+04    6.34e+04
x1             0.8057      0.045     17.846      0.000       0.715       0.897
x2            -0.0268      0.051     -0.526      0.602      -0.130       0.076
x3             0.0272      0.016      1.655      0.105      -0.006       0.060
==============================================================================
Omnibus:                       14.838   Durbin-Watson:                   1.282
Prob(Omnibus):                  0.001   Jarque-Bera (JB):               21.442
Skew:                          -0.949   Prob(JB):                     2.21e-05
Kurtosis:                       5.586   Cond. No.                     1.40e+06
==============================================================================

I want to do backward elimination for P values for significance level 0.05. For this i need to remove the predictor with highest P values and run the code again.

I wanted to know if there is a way to extract the P values from the summary object, so that i can run a loop with conditional statement and find the significant variables without repeating the steps manually.

Thank you.

973

asked Aug 07 '18 19:08

Sagun Kayastha

3 Answers

The answer from @Michael B works well, but requires "recreating" the table. The table itself is actually directly available from the summary().tables attribute. Each table in this attribute (which is a list of tables) is a SimpleTable, which has methods for outputting different formats. We can then read any of those formats back as a pd.DataFrame:

import statsmodels.api as sm  model = sm.OLS(y,x) results = model.fit() results_summary = results.summary()  # Note that tables is a list. The table at index 1 is the "core" table. Additionally, read_html puts dfs in a list, so we want index 0 results_as_html = results_summary.tables[1].as_html() pd.read_html(results_as_html, header=0, index_col=0)[0]

answered Sep 18 '22 21:09

ZaxR

Store your model fit as a variable results, like so:

import statsmodels.api as sm
model = sm.OLS(y,x)
results = model.fit()

Then create a a function like below:

def results_summary_to_dataframe(results):
    '''take the result of an statsmodel results table and transforms it into a dataframe'''
    pvals = results.pvalues
    coeff = results.params
    conf_lower = results.conf_int()[0]
    conf_higher = results.conf_int()[1]

    results_df = pd.DataFrame({"pvals":pvals,
                               "coeff":coeff,
                               "conf_lower":conf_lower,
                               "conf_higher":conf_higher
                                })

    #Reordering...
    results_df = results_df[["coeff","pvals","conf_lower","conf_higher"]]
    return results_df

You can further explore all the attributes of the results object by using dir() to print, then add them to the function and df accordingly.

answered Sep 19 '22 21:09

Michael B

An easy solution is just one line of code:

LRresult = (result.summary2().tables[1])

As ZaxR mentioned in the following comment, Summary2 is not yet considered stable, while it works well with Summary too. So this could be correct answer:

LRresult = (result.summary().tables[1])

This will give you a dataframe object:

type(LRresult)

pandas.core.frame.DataFrame

To get the significant variables and run the test again:

newlist = list(LRresult[LRresult['P>|z|']<=0.05].index)[1:]
myform1 = 'binary_Target' + ' ~ ' + ' + '.join(newlist)

M1_test2 = smf.logit(formula=myform1,data=myM1_1)

result2 = M1_test2.fit(maxiter=200)
LRresult2 = (result2.summary2().tables[1])
LRresult2

answered Sep 18 '22 21:09

Daniel Zhou

Related questions
                            
                                Is it possible to change an instance's method implementation without changing all other instances of the same class? [duplicate]
                            
                                Upper memory limit?
                            
                                Add an item between each item already in the list [duplicate]
                            
                                PySide / PyQt detect if user trying to close window
                            
                                Draw axis lines or the origin for Matplotlib contour plot
                            
                                "Unused import warning" and pylint
                            
                                Python argparse integer condition (>=12)
                            
                                Short Python Code to say "Pick the lower value"?
                            
                                How to Print "Pretty" String Output in Python
                            
                                Import NumPy on PyCharm
                            
                                How to concatenate multiple pandas.DataFrames without running into MemoryError
                            
                                Creating a list in Python with multiple copies of a given object in a single line
                            
                                Replace NaN's in NumPy array with closest non-NaN value
                            
                                No module named Image tk [closed]
                            
                                python-social-auth AuthCanceled exception
                            
                                Mouse Position Python Tkinter
                            
                                How to find button with Selenium by its text inside (Python)?
                            
                                Simulating a 'local static' variable in python
                            
                                Extract int from string in Pandas
                            
                                Get only NEW Emails imaplib and python

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Converting statsmodels summary object to Pandas Dataframe

Tags:

python

pandas

statsmodels