I have a pandas dataframe with daily stock returns for individual companies from 1963-2012 (almost 60 million rows). I want to estimate the CAPM betas, so I need to run an rolling OLS regression over the past 250 days for each company and add the beta values to the existing dataframe.
I have already tried to combine the PandasRollingOLS function from the pyfinance package with 'groupby', which only returned a memory error. I have also tried to write two for loops, the first one grouping by company ('PERMNO'), the second for loop doing the rolling regression. However, this also does not work
my dataframe looks like this:
PERMNO RET mkt RF
date
1986-01-08 10000 -0.024640 -0.020994 0.00025
1986-01-09 10000 -0.000250 -0.011469 0.00025
1986-01-10 10000 -0.000250 -0.000167 0.00025
1986-01-13 10000 0.049750 0.002499 0.00025
1986-01-14 10000 0.047369 0.000116 0.00025
1986-01-15 10000 0.045205 0.007956 0.00025
1986-01-16 10000 0.043228 0.004452 0.00025
1986-01-17 10000 -0.000250 -0.001991 0.00025
1986-01-20 10000 -0.000250 -0.003985 0.00025
1986-01-21 10000 -0.000250 -0.007242 0.00025
rolling = daily.groupby('PERMNO').apply(lambda x: ols.PandasRollingOLS(y=daily['RET'], x=daily['mkt'], window=250))
regression=np.zeros((len(daily.index),2))
for group_name, df_group in daily_grouped:
for row in range(0,len(daily.index),1):
y= daily.RET[row:row + 250]
x= daily.mkt[row:row + 250]
regression[row]=np.polyfit(x,y,1)
daily['beta']=regression[:,0]
I am wondering how to get the beta values from a rolling regression by company
I'm not very familiar with pyfinance, but statsmodels.regression.rolling.RollingOLS works:
# Importing
import pandas as pd
from statsmodels.regression.rolling import RollingOLS
import statsmodels.api as sm
import numpy as np
# Creating mock data
permno = []
for i in range(0, 10000):
permno.append(int(i//2200)+10000)
daily = pd.DataFrame(data={"PERMNO": permno, "RET": np.random.random(10000)*2-1,
"mkt": np.random.random(10000)*2-1, "RF": np.random.random(10000)*2-1},
index=pd.date_range(start="1963-01-01", periods=10000, freq="B"))
# Rolling OLS calculation
rolling = daily.groupby("PERMNO").apply(lambda x: RollingOLS(endog=x['RET'], exog=sm.add_constant(x['mkt']), window=250).fit().params)
Notice that I use x["dataframe col name"]
not daily["dataframe col name"]
. I think this is the cause of the problem, because you use apply(lambda x:
but haven't then used x
at all in the computation. You may be able to change your code from daily to x and find that it works.
I have tried pyfinance, using your rolling calculation, but received the following (which is what occurs with statsmodels RollingOLS when there is no .fit().params
added to the end to actually return relevant data, suggesting you need a bit more on the end of your code - possibly .beta
on the end?). To change your code, if you need the y=
and x=
, use something else instead of x, such as lambda q
, then you can use y=q['RET'], x=q['mkt']
.
#Out:
#PERMNO
#10000 <pyfinance.ols.PandasRollingOLS object at 0x00...
#10001 <pyfinance.ols.PandasRollingOLS object at 0x00...
#10002 <pyfinance.ols.PandasRollingOLS object at 0x00...
#10003 <pyfinance.ols.PandasRollingOLS object at 0x00...
#10004 <pyfinance.ols.PandasRollingOLS object at 0x00...
#dtype: object
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With