I'm trying to run a panel regression on pandas Dataframes:
Currently I have two dataframes each containing 52 rows(dates)*99 columns(99stocks) :Markdown file with data representation
When running:
est=sm.OLS(Stockslist,averages).fit()
est.summary()
I get the ValueError: shapes (52,99) and (52,99) not aligned: 99 (dim 1) != 52 (dim 0)
Can somebody point me out what I am doing wrong? The model is simply y(i,t)=x(i,t)+error term so no intercept. However I would like to add time effects in the future.
Kind regards, Jeroen
In this regard, an OLS regression is likely to be ineffective with panel data, as the differences between fixed and random effects are not being accounted for.
Panel data methods are the econometric tools used to estimate parameters compute partial effects of interest in nonlinear models, quantify dynamic linkages, and perform valid inference when data are available on repeated cross sections.
Pooled regression is standard ordinary least squares (OLS) regression without any cross-sectional or time effects. The error structure is simply , where the are independently and identically distributed (iid) with zero mean and variance .
Panel data, sometimes referred to as longitudinal data, is data that contains observations about different cross sections across time. Examples of groups that may make up panel data series include countries, firms, individuals, or demographic groups.
Try the below - I've copied the stock data from the above link and added random data for the x
column. For a panel regression you need a 'MultiIndex' as mentioned in the comments.
df = pd.DataFrame(df.set_index('dates').stack())
df.columns = ['y']
df['x'] = np.random.random(size=len(df.index))
df.info()
MultiIndex: 100 entries, (2015-04-03 00:00:00, AB INBEV) to (2015-05-01 00:00:00, ZC.PA)
Data columns (total 2 columns):
y 100 non-null float64
x 100 non-null float64
dtypes: float64(2)
memory usage: 2.3+ KB
regression = PanelOLS(y=df['y'], x=df[['x']])
regression
-------------------------Summary of Regression Analysis-------------------------
Formula: Y ~ <x> + <intercept>
Number of Observations: 100
Number of Degrees of Freedom: 2
R-squared: 0.0042
Adj R-squared: -0.0060
Rmse: 0.2259
F-stat (1, 98): 0.4086, p-value: 0.5242
Degrees of Freedom: model 1, resid 98
-----------------------Summary of Estimated Coefficients------------------------
Variable Coef Std Err t-stat p-value CI 2.5% CI 97.5%
--------------------------------------------------------------------------------
x -0.0507 0.0794 -0.64 0.5242 -0.2063 0.1048
intercept 2.1952 0.0448 49.05 0.0000 2.1075 2.2829
---------------------------------End of Summary---------------------------------
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With