Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Python Statsmodels QuantReg Intercept

Problem Setup In statsmodels Quantile Regression problem, their Least Absolute Deviation summary output shows the Intercept. In that example, they are using a formula

from __future__ import print_function
import patsy
import numpy as np
import pandas as pd
import statsmodels.api as sm
import statsmodels.formula.api as smf
import matplotlib.pyplot as plt
from statsmodels.regression.quantile_regression import QuantReg

data = sm.datasets.engel.load_pandas().data

mod = smf.quantreg('foodexp ~ income', data)
res = mod.fit(q=.5)
print(res.summary())

                         QuantReg Regression Results                          
==============================================================================
Dep. Variable:                foodexp   Pseudo R-squared:               0.6206
Model:                       QuantReg   Bandwidth:                       64.51
Method:                 Least Squares   Sparsity:                        209.3
Date:                Fri, 09 Oct 2015   No. Observations:                  235
Time:                        15:44:23   Df Residuals:                      233
                                        Df Model:                            1
==============================================================================
                 coef    std err          t      P>|t|      [95.0% Conf. Int.]
------------------------------------------------------------------------------
Intercept     81.4823     14.634      5.568      0.000        52.649   110.315
income         0.5602      0.013     42.516      0.000         0.534     0.586
==============================================================================

The condition number is large, 2.38e+03. This might indicate that there are
strong multicollinearity or other numerical problems.

The Question

How can I achieve a summary output with the Intercept without using the statsmodels.formula.api as smf formula approach?

like image 663
Jarad Avatar asked Oct 10 '15 05:10

Jarad


1 Answers

Of course, as I put this question together, I figured it out. Rather than delete it, I'll share in case somebody out there ever runs across this.

As I suspected, I needed to add_constant() but I wasn't sure how. I was doing something dumb and adding the constant to the Y (endog) variable instead of the X (exog) variable.

The Answer

from __future__ import print_function
import patsy
import numpy as np
import pandas as pd
import statsmodels.api as sm
import matplotlib.pyplot as plt
from statsmodels.regression.quantile_regression import QuantReg

data = sm.datasets.engel.load_pandas().data
data = sm.add_constant(data)

mod = QuantReg(data['foodexp'], data[['const', 'income']])
res = mod.fit(q=.5)
print(res.summary())

                         QuantReg Regression Results                          
==============================================================================
Dep. Variable:                foodexp   Pseudo R-squared:               0.6206
Model:                       QuantReg   Bandwidth:                       64.51
Method:                 Least Squares   Sparsity:                        209.3
Date:                Fri, 09 Oct 2015   No. Observations:                  235
Time:                        22:24:47   Df Residuals:                      233
                                        Df Model:                            1
==============================================================================
                 coef    std err          t      P>|t|      [95.0% Conf. Int.]
------------------------------------------------------------------------------
const         81.4823     14.634      5.568      0.000        52.649   110.315
income         0.5602      0.013     42.516      0.000         0.534     0.586
==============================================================================

The condition number is large, 2.38e+03. This might indicate that there are
strong multicollinearity or other numerical problems.

As an FYI, what I find interesting is that add_constant() just adds a column of 1s to your data. More information about add_constant() can be found here.

like image 57
Jarad Avatar answered Sep 20 '22 02:09

Jarad