Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Pandas rolling OLS being deprecated

Tags:

python

pandas

When I run an old code, I get the following warning: " pandas.stats.ols module is deprecated and will be removed in a future version. We refer to external packages like statsmodels". I could not understand if there is a user-friendly rolling OLS module in statsmodel. What was nice about the pandas.stats.ols module was that you could easily state if an intercept was or not needed, the type of window (rolling, expanding) and the window length. Is there a module that does exactly the same?

For example:

YY = DataFrame(np.log(np.linspace(1,10,10)),columns=['Y'])
XX = DataFrame(np.transpose([np.linspace(1,10,10),np.linspace(‌​2,10,10)]),columns=[‌​'XX1','XX2'])
from pandas.stats.ols import MovingOLS
MovingOLS( y=YY['Y'], x=XX, intercept=True, window_type='rolling', window=5).resid

I would like an example of how to get the result of the last line (the residual of the moving ols) using statsmodel or any other module.

Thanks

like image 999
serrajo Avatar asked Jan 06 '17 10:01

serrajo


1 Answers

I created an ols module designed to mimic pandas' deprecated MovingOLS; it is here.

It has three core classes:

  • OLS : static (single-window) ordinary least-squares regression. The output are NumPy arrays
  • RollingOLS : rolling (multi-window) ordinary least-squares regression. The output are higher-dimension NumPy arrays.
  • PandasRollingOLS : wraps the results of RollingOLS in pandas Series & DataFrames. Designed to mimic the look of the deprecated pandas module.

Note that the module is part of a package (which I'm currently in the process of uploading to PyPi) and it requires one inter-package import.

The first two classes above are implemented entirely in NumPy and primarily use matrix algebra. RollingOLS takes advantage of broadcasting extensively also. Attributes largely mimic statsmodels' OLS RegressionResultsWrapper.

An example:

# Pull some data from fred.stlouisfed.org
from pandas_datareader.data import DataReader

syms = {'TWEXBMTH' : 'usd', 
        'T10Y2YM' : 'term_spread', 
        'PCOPPUSDM' : 'copper'
       }
data = (DataReader(syms.keys(), 'fred', start='2000-01-01')
        .pct_change()
        .dropna())
data = data.rename(columns=syms)
print(data.head())
                # usd  term_spread   copper
# DATE                                     
# 2000-02-01  0.01260     -1.40909 -0.01997
# 2000-03-01 -0.00012      2.00000 -0.03720
# 2000-04-01  0.00564      0.51852 -0.03328
# 2000-05-01  0.02204     -0.09756  0.06135
# 2000-06-01 -0.01012      0.02703 -0.01850

# Rolling regressions

from pyfinance.ols import OLS, RollingOLS, PandasRollingOLS

y = data.usd
x = data.drop('usd', axis=1)

window = 12  # months
model = PandasRollingOLS(y=y, x=x, window=window)

# Here `.resids` will be a stacked, MultiIndex'd DataFrame.  Each outer
#     index is a "period ending" and each inner index block are the
#     subperiods for that rolling window.
print(model.resids)
# end         subperiod 
# 2001-01-01  2000-02-01    0.00834
            # 2000-03-01   -0.00375
            # 2000-04-01    0.00194
            # 2000-05-01    0.01312
            # 2000-06-01   -0.01460
            # 2000-07-01   -0.00462
            # 2000-08-01   -0.00032
            # 2000-09-01    0.00299
            # 2000-10-01    0.01103
            # 2000-11-01    0.00556
            # 2000-12-01   -0.01544
            # 2001-01-01   -0.00425

# 2017-06-01  2016-07-01    0.01098
            # 2016-08-01   -0.00725
            # 2016-09-01    0.00447
            # 2016-10-01    0.00422
            # 2016-11-01   -0.00213
            # 2016-12-01    0.00558
            # 2017-01-01    0.00166
            # 2017-02-01   -0.01554
            # 2017-03-01   -0.00021
            # 2017-04-01    0.00057
            # 2017-05-01    0.00085
            # 2017-06-01   -0.00320
# Name: resids, dtype: float64
like image 169
Brad Solomon Avatar answered Sep 29 '22 09:09

Brad Solomon