Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

a Panel regression in Python

I'm trying to run a panel regression on pandas Dataframes:

Currently I have two dataframes each containing 52 rows(dates)*99 columns(99stocks) :Markdown file with data representation

When running:

est=sm.OLS(Stockslist,averages).fit()
est.summary()

I get the ValueError: shapes (52,99) and (52,99) not aligned: 99 (dim 1) != 52 (dim 0)

Can somebody point me out what I am doing wrong? The model is simply y(i,t)=x(i,t)+error term so no intercept. However I would like to add time effects in the future.

Kind regards, Jeroen

like image 283
jerreyz Avatar asked Apr 17 '16 21:04

jerreyz


People also ask

Can you do OLS regression on panel data?

In this regard, an OLS regression is likely to be ineffective with panel data, as the differences between fixed and random effects are not being accounted for.

What are panel data methods?

Panel data methods are the econometric tools used to estimate parameters compute partial effects of interest in nonlinear models, quantify dynamic linkages, and perform valid inference when data are available on repeated cross sections.

What is a pooled regression?

Pooled regression is standard ordinary least squares (OLS) regression without any cross-sectional or time effects. The error structure is simply , where the are independently and identically distributed (iid) with zero mean and variance .

What is panel data example?

Panel data, sometimes referred to as longitudinal data, is data that contains observations about different cross sections across time. Examples of groups that may make up panel data series include countries, firms, individuals, or demographic groups.


1 Answers

Try the below - I've copied the stock data from the above link and added random data for the x column. For a panel regression you need a 'MultiIndex' as mentioned in the comments.

df = pd.DataFrame(df.set_index('dates').stack())
df.columns = ['y']
df['x'] = np.random.random(size=len(df.index))
df.info()

MultiIndex: 100 entries, (2015-04-03 00:00:00, AB INBEV) to (2015-05-01 00:00:00, ZC.PA)
Data columns (total 2 columns):
y    100 non-null float64
x    100 non-null float64
dtypes: float64(2)
memory usage: 2.3+ KB

regression = PanelOLS(y=df['y'], x=df[['x']])

regression

-------------------------Summary of Regression Analysis-------------------------

Formula: Y ~ <x> + <intercept>

Number of Observations:         100
Number of Degrees of Freedom:   2

R-squared:         0.0042
Adj R-squared:    -0.0060

Rmse:              0.2259

F-stat (1, 98):     0.4086, p-value:     0.5242

Degrees of Freedom: model 1, resid 98

-----------------------Summary of Estimated Coefficients------------------------
      Variable       Coef    Std Err     t-stat    p-value    CI 2.5%   CI 97.5%
--------------------------------------------------------------------------------
             x    -0.0507     0.0794      -0.64     0.5242    -0.2063     0.1048
     intercept     2.1952     0.0448      49.05     0.0000     2.1075     2.2829
---------------------------------End of Summary---------------------------------
like image 139
Stefan Avatar answered Sep 23 '22 23:09

Stefan