How to do OLS Regression with the latest version of Pandas

Tags:

I wanted to run a rolling 1000 window OLS regression estimation of the dataset for my evaluation found at the following URL:

https://drive.google.com/open?id=0B2Iv8dfU4fTUa3dPYW5tejA0bzg

I tried using the following Python script with pandas version 0.20.2.

# /usr/bin/python -tt

import numpy as np
import matplotlib.pyplot as plt
import pandas as pd
from statsmodels.formula.api import ols

df = pd.read_csv('estimated.csv', names=('x','y'))

model = pd.stats.ols.MovingOLS(y=df.Y, x=df[['y']], 
                               window_type='rolling', window=1000, intercept=True)
df['Y_hat'] = model.y_predict

However, when I run my Python script, I am getting this error: AttributeError: module 'pandas.stats' has no attribute 'ols'. I found out the reason for this error is because it is removed since Pandas version 0.20.0as we can see it from the following link.

https://github.com/pandas-dev/pandas/pull/11898

How can we do OLS Regression with the latest version of Pandas?

291

asked Jun 22 '17 21:06

Desta Haileselassie Hagos

1 Answers

While normally I would suggest applying something like statsmodels.ols on a rolling basis*, your dataset is large (length-1000 windows on 258k rows) and you will run into a memory error that way. Therefore, you could use the linear algebra approach to calculating coefficients and then apply these coefficients to each window of your explanatory variable. For more on this, see A Matrix Formulation of the Multiple Regression Model.

* To see an implementation of statsmodels, see a wrapper I created here. An example is here.

Realize that yhat here is not an nx1 vector--it is a bunch of nx1 vectors stacked on top of each other, i.e. you have 1 set of predictions per rolling 1000-period block. So the shape of your predictions will be (257526, 1000), as shown below.

import numpy as np
import pandas as pd

df = pd.read_csv('input/estimated.csv', names=('x','y'))

def rolling_windows(a, window):
    """Creates rolling-window 'blocks' of length `window` from `a`.

    Note that the orientation of rows/columns follows that of pandas.

    Example
    =======
    onedim = np.arange(20)
    twodim = onedim.reshape((5,4))

    print(twodim)
    [[ 0  1  2  3]
     [ 4  5  6  7]
     [ 8  9 10 11]
     [12 13 14 15]
     [16 17 18 19]]

    print(rwindows(onedim, 3)[:5])
    [[0 1 2]
     [1 2 3]
     [2 3 4]
     [3 4 5]
     [4 5 6]]

    print(rwindows(twodim, 3)[:5])
    [[[ 0  1  2  3]
      [ 4  5  6  7]
      [ 8  9 10 11]]

     [[ 4  5  6  7]
      [ 8  9 10 11]
      [12 13 14 15]]

     [[ 8  9 10 11]
      [12 13 14 15]
      [16 17 18 19]]]
    """

    if isinstance(a, (Series, DataFrame)):
        a = a.values
    if a.ndim == 1:
        a = a.reshape(-1, 1)
    shape = (a.shape[0] - window + 1, window) + a.shape[1:]
    strides = (a.strides[0],) + a.strides
    windows = np.lib.stride_tricks.as_strided(a, shape=shape, strides=strides)
    return np.squeeze(windows)

def coefs(y, x):
    return np.dot(np.linalg.inv(np.dot(x.T, x)), np.dot(x.T, y))

rendog = rolling_windows(df.x.values, 1000)
rexog = rolling_windows(df.drop('x', axis=1).values, 1000)

preds = list()
for endog, exog in zip(rendog, rexog):
    pred = np.sum(coefs(endog, exog).T * exog, axis=1)
    preds.append(pred)
preds = np.array(preds)

print(preds.shape)
(257526, 1000)

Lastly: have you considered using a Random Forest Classifier here, given that your y variable is discrete?

141

answered Oct 18 '22 19:10

Brad Solomon

Related questions
                            
                                Creating `input_fn` from iterator
                            
                                Suppress Ansible Ad Hoc Warning
                            
                                Python Kivy: hide virtual keyboard in Text Input Field
                            
                                Importing classes/functions with same name as module
                            
                                HelpFormatter in Click
                            
                                No encoding declared
                            
                                PyQt - Add right click to a widget
                            
                                Importing a python script/module that uses argparse into another python script
                            
                                Replicate part of production django database to local or staging
                            
                                Flask SQLAlchemy enum field default value
                            
                                disabling one of the options in WTForms SelectField
                            
                                Compare two pandas dataframe with different size
                            
                                Pandas dataframe boolean mask on multiple columns
                            
                                Python Social Auth for Django raises Authforbidden exception
                            
                                Decimal module and complex numbers in Python
                            
                                How to filter a ttk.treeview in Python?
                            
                                Mocked patch of os.listdir not working for unittesting
                            
                                drop multiindex level but keep names of columns - pandas
                            
                                Matplotlib animation not showing
                            
                                Flask session variables

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

How to do OLS Regression with the latest version of Pandas

Tags:

python

python-3.x

pandas

numpy

linear-regression

Desta Haileselassie Hagos

People also ask

1 Answers

Brad Solomon

Recent Activity

Donate For Us