Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Rolling regression by group in pandas dataframe

Tags:

python

pandas

I have a pandas dataframe with daily stock returns for individual companies from 1963-2012 (almost 60 million rows). I want to estimate the CAPM betas, so I need to run an rolling OLS regression over the past 250 days for each company and add the beta values to the existing dataframe.

I have already tried to combine the PandasRollingOLS function from the pyfinance package with 'groupby', which only returned a memory error. I have also tried to write two for loops, the first one grouping by company ('PERMNO'), the second for loop doing the rolling regression. However, this also does not work

my dataframe looks like this:

           PERMNO       RET       mkt       RF
date                                          
1986-01-08  10000 -0.024640 -0.020994  0.00025
1986-01-09  10000 -0.000250 -0.011469  0.00025
1986-01-10  10000 -0.000250 -0.000167  0.00025
1986-01-13  10000  0.049750  0.002499  0.00025
1986-01-14  10000  0.047369  0.000116  0.00025
1986-01-15  10000  0.045205  0.007956  0.00025
1986-01-16  10000  0.043228  0.004452  0.00025
1986-01-17  10000 -0.000250 -0.001991  0.00025
1986-01-20  10000 -0.000250 -0.003985  0.00025
1986-01-21  10000 -0.000250 -0.007242  0.00025

rolling = daily.groupby('PERMNO').apply(lambda x: ols.PandasRollingOLS(y=daily['RET'], x=daily['mkt'], window=250))
regression=np.zeros((len(daily.index),2))

for group_name, df_group in daily_grouped:
    for row in range(0,len(daily.index),1):
        y= daily.RET[row:row + 250]
        x= daily.mkt[row:row + 250]
        regression[row]=np.polyfit(x,y,1)

daily['beta']=regression[:,0]

I am wondering how to get the beta values from a rolling regression by company

like image 564
JL1118 Avatar asked Nov 17 '22 07:11

JL1118


1 Answers

I'm not very familiar with pyfinance, but statsmodels.regression.rolling.RollingOLS works:

# Importing
import pandas as pd
from statsmodels.regression.rolling import RollingOLS
import statsmodels.api as sm
import numpy as np

# Creating mock data
permno = []
for i in range(0, 10000):
    permno.append(int(i//2200)+10000)

daily = pd.DataFrame(data={"PERMNO": permno, "RET": np.random.random(10000)*2-1,
                           "mkt": np.random.random(10000)*2-1, "RF": np.random.random(10000)*2-1},
                     index=pd.date_range(start="1963-01-01", periods=10000, freq="B"))

# Rolling OLS calculation
rolling = daily.groupby("PERMNO").apply(lambda x: RollingOLS(endog=x['RET'], exog=sm.add_constant(x['mkt']), window=250).fit().params)

Notice that I use x["dataframe col name"] not daily["dataframe col name"]. I think this is the cause of the problem, because you use apply(lambda x: but haven't then used x at all in the computation. You may be able to change your code from daily to x and find that it works.

I have tried pyfinance, using your rolling calculation, but received the following (which is what occurs with statsmodels RollingOLS when there is no .fit().params added to the end to actually return relevant data, suggesting you need a bit more on the end of your code - possibly .beta on the end?). To change your code, if you need the y= and x=, use something else instead of x, such as lambda q, then you can use y=q['RET'], x=q['mkt'].

#Out: 
#PERMNO
#10000    <pyfinance.ols.PandasRollingOLS object at 0x00...
#10001    <pyfinance.ols.PandasRollingOLS object at 0x00...
#10002    <pyfinance.ols.PandasRollingOLS object at 0x00...
#10003    <pyfinance.ols.PandasRollingOLS object at 0x00...
#10004    <pyfinance.ols.PandasRollingOLS object at 0x00...
#dtype: object
like image 100
Rawson Avatar answered Dec 22 '22 00:12

Rawson