Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

statsmodels: printing summary of more than one regression models together

In the Python library Statsmodels, you can print out the regression results with print(results.summary()), how can I print out the summary of more than one regressions in one table, for better comparison?

A linear regression, code taken from statsmodels documentation:

nsample = 100
x = np.linspace(0, 10, 100)
X = np.column_stack((x, x**2))
beta = np.array([0.1, 10])
e = np.random.normal(size=nsample)
y = np.dot(X, beta) + e

model = sm.OLS(y, X)
results_noconstant = model.fit()

Then I add a constant to the model and run the regression again:

beta = np.array([1, 0.1, 10])
X = sm.add_constant(X)
y = np.dot(X, beta) + e 

model = sm.OLS(y, X)
results_withconstant = model.fit()

I'd like to see the summaries of results_noconstant and results_withconstant printed out in one table. This should be a very useful function, but I didn't find any instruction about this in the statsmodels documentation.

EDIT: The regression table I had in mind would be something like this, I wonder whether there is ready-made functionality to do this.

like image 296
Olivier Ma Avatar asked Dec 02 '22 15:12

Olivier Ma


1 Answers

There is summary_col, which AFAIR is still missing from the documentation.

I have not really tried it out much, but I found a related example from an issue to remove some of the "nuisance" parameters.

"""
mailing list, and issue https://github.com/statsmodels/statsmodels/pull/1638
"""

import pandas as pd
import numpy as np
import string
import statsmodels.formula.api as smf
from statsmodels.iolib.summary2 import summary_col

df = pd.DataFrame({'A' : list(string.ascii_uppercase)*10,
                   'B' : list(string.ascii_lowercase)*10,
                   'C' : np.random.randn(260),
                   'D' : np.random.normal(size=260),
                   'E' : np.random.random_integers(0,10,260)})

m1 = smf.ols('E ~ D',data=df).fit()
m2 = smf.ols('E ~ D + C',data=df).fit()
m3 = smf.ols('E ~ D + C + B',data=df).fit()
m4 = smf.ols('E ~ D + C + B + A',data=df).fit()

print(summary_col([m1,m2,m3,m4]))

There is still room for improvement.

like image 127
Josef Avatar answered Dec 29 '22 10:12

Josef