Is there an existing function to estimate fixed effect (one-way or two-way) from Pandas or Statsmodels. There used to be a function in Statsmodels but it seems discontinued. And in Pandas, there is something called <code>plm</code>, but I can't import it or run it using <code>pd.plm()</code>.

As noted in the comments, PanelOLS has been removed from Pandas as of version 0.20.0. So you really have three options: <ol> <li>If you use Python 3 you can use <code>linearmodels</code> as specified in the more recent answer: https://stackoverflow.com/a/44836199/3435183</li> <li>Just specify various dummies in your <code>statsmodels</code> specification, e.g. using <code>pd.get_dummies</code>. May not be feasible if the number of fixed effects is large.</li> <li> Or do some groupby based demeaning and then use <code>statsmodels</code> (this would work if you're estimating lots of fixed effects). Here is a barebones version of what you could do for one way fixed effects: <pre class="prettyprint"><code>import statsmodels.api as sm import statsmodels.formula.api as smf import patsy def areg(formula,data=None,absorb=None,cluster=None): y,X = patsy.dmatrices(formula,data,return_type='dataframe') ybar = y.mean() y = y - y.groupby(data[absorb]).transform('mean') + ybar Xbar = X.mean() X = X - X.groupby(data[absorb]).transform('mean') + Xbar reg = sm.OLS(y,X) # Account for df loss from FE transform reg.df_resid -= (data[absorb].nunique() - 1) return reg.fit(cov_type='cluster',cov_kwds={'groups':data[cluster].values}) </code></pre> </li> </ol> For example, suppose you have a panel of stock data: stock returns and other stock data for all stocks, every month over a number of months and you want to regress returns on lagged returns with calendar month fixed effects (where the calender month variable is called <code>caldt</code>) and you also want to cluster the standard errors by calendar month. You can estimate such a fixed effect model with the following: <pre class="prettyprint"><code>reg0 = areg('ret~retlag',data=df,absorb='caldt',cluster='caldt') </code></pre> And here is what you can do if using an older version of <code>Pandas</code>: An example with time fixed effects using pandas' <code>PanelOLS</code> (which is in the plm module). Notice, the import of <code>PanelOLS</code>: <pre class="prettyprint"><code>>>> from pandas.stats.plm import PanelOLS >>> df y x date id 2012-01-01 1 0.1 0.2 2 0.3 0.5 3 0.4 0.8 4 0.0 0.2 2012-02-01 1 0.2 0.7 2 0.4 0.5 3 0.2 0.3 4 0.1 0.1 2012-03-01 1 0.6 0.9 2 0.7 0.5 3 0.9 0.6 4 0.4 0.5 </code></pre> Note, the dataframe must have a multindex set ; <code>panelOLS</code> determines the <code>time</code> and <code>entity</code> effects based on the index: <pre class="prettyprint"><code>>>> reg = PanelOLS(y=df['y'],x=df[['x']],time_effects=True) >>> reg -------------------------Summary of Regression Analysis------------------------- Formula: Y ~ <x> Number of Observations: 12 Number of Degrees of Freedom: 4 R-squared: 0.2729 Adj R-squared: 0.0002 Rmse: 0.1588 F-stat (1, 8): 1.0007, p-value: 0.3464 Degrees of Freedom: model 3, resid 8 -----------------------Summary of Estimated Coefficients------------------------ Variable Coef Std Err t-stat p-value CI 2.5% CI 97.5% -------------------------------------------------------------------------------- x 0.3694 0.2132 1.73 0.1214 -0.0485 0.7872 ---------------------------------End of Summary--------------------------------- </code></pre> Docstring: <pre class="prettyprint"><code>PanelOLS(self, y, x, weights = None, intercept = True, nw_lags = None, entity_effects = False, time_effects = False, x_effects = None, cluster = None, dropped_dummies = None, verbose = False, nw_overlap = False) Implements panel OLS. See ols function docs </code></pre> This is another function (like <code>fama_macbeth</code>) where I believe the plan is to move this functionality to <code>statsmodels</code>.

Fixed effect in Pandas or Statsmodels

Video Answer

2 Answers

As noted in the comments, PanelOLS has been removed from Pandas as of version 0.20.0. So you really have three options:

If you use Python 3 you can use linearmodels as specified in the more recent answer: https://stackoverflow.com/a/44836199/3435183
Just specify various dummies in your statsmodels specification, e.g. using pd.get_dummies. May not be feasible if the number of fixed effects is large.

Or do some groupby based demeaning and then use statsmodels (this would work if you're estimating lots of fixed effects). Here is a barebones version of what you could do for one way fixed effects:

import statsmodels.api as sm import statsmodels.formula.api as smf import patsy  def areg(formula,data=None,absorb=None,cluster=None):       y,X = patsy.dmatrices(formula,data,return_type='dataframe')      ybar = y.mean()     y = y -  y.groupby(data[absorb]).transform('mean') + ybar      Xbar = X.mean()     X = X - X.groupby(data[absorb]).transform('mean') + Xbar      reg = sm.OLS(y,X)     # Account for df loss from FE transform     reg.df_resid -= (data[absorb].nunique() - 1)      return reg.fit(cov_type='cluster',cov_kwds={'groups':data[cluster].values})

For example, suppose you have a panel of stock data: stock returns and other stock data for all stocks, every month over a number of months and you want to regress returns on lagged returns with calendar month fixed effects (where the calender month variable is called caldt) and you also want to cluster the standard errors by calendar month. You can estimate such a fixed effect model with the following:

reg0 = areg('ret~retlag',data=df,absorb='caldt',cluster='caldt')

And here is what you can do if using an older version of Pandas:

An example with time fixed effects using pandas' PanelOLS (which is in the plm module). Notice, the import of PanelOLS:

>>> from pandas.stats.plm import PanelOLS >>> df                  y    x date       id 2012-01-01 1   0.1  0.2            2   0.3  0.5            3   0.4  0.8            4   0.0  0.2 2012-02-01 1   0.2  0.7             2   0.4  0.5            3   0.2  0.3            4   0.1  0.1 2012-03-01 1   0.6  0.9            2   0.7  0.5            3   0.9  0.6            4   0.4  0.5

Note, the dataframe must have a multindex set ; panelOLS determines the time and entity effects based on the index:

>>> reg  = PanelOLS(y=df['y'],x=df[['x']],time_effects=True) >>> reg  -------------------------Summary of Regression Analysis-------------------------  Formula: Y ~ <x>  Number of Observations:         12 Number of Degrees of Freedom:   4  R-squared:         0.2729 Adj R-squared:     0.0002  Rmse:              0.1588  F-stat (1, 8):     1.0007, p-value:     0.3464  Degrees of Freedom: model 3, resid 8  -----------------------Summary of Estimated Coefficients------------------------       Variable       Coef    Std Err     t-stat    p-value    CI 2.5%   CI 97.5% --------------------------------------------------------------------------------              x     0.3694     0.2132       1.73     0.1214    -0.0485     0.7872 ---------------------------------End of Summary---------------------------------

Docstring:

PanelOLS(self, y, x, weights = None, intercept = True, nw_lags = None, entity_effects = False, time_effects = False, x_effects = None, cluster = None, dropped_dummies = None, verbose = False, nw_overlap = False)  Implements panel OLS.  See ols function docs

This is another function (like fama_macbeth) where I believe the plan is to move this functionality to statsmodels.

136

answered Oct 21 '22 00:10

Karl D.

There is a package called linearmodels (https://pypi.org/project/linearmodels/) that has a fairly complete fixed effects and random effects implementation including clustered standard errors. It does not use high-dimensional OLS to eliminate effects and so can be used with large data sets.

# Outer is entity, inner is time entity = list(map(chr,range(65,91))) time = list(pd.date_range('1-1-2014',freq='A', periods=4)) index = pd.MultiIndex.from_product([entity, time]) df = pd.DataFrame(np.random.randn(26*4, 2),index=index, columns=['y','x'])  from linearmodels.panel import PanelOLS mod = PanelOLS(df.y, df.x, entity_effects=True) res = mod.fit(cov_type='clustered', cluster_entity=True) print(res)

This produces the following output:

                          PanelOLS Estimation Summary                            ================================================================================ Dep. Variable:                      y   R-squared:                        0.0029 Estimator:                   PanelOLS   R-squared (Between):             -0.0109 No. Observations:                 104   R-squared (Within):               0.0029 Date:                Thu, Jun 29 2017   R-squared (Overall):             -0.0007 Time:                        23:52:28   Log-likelihood                   -125.69 Cov. Estimator:             Clustered                                                                                    F-statistic:                      0.2256 Entities:                          26   P-value                           0.6362 Avg Obs:                       4.0000   Distribution:                    F(1,77) Min Obs:                       4.0000                                            Max Obs:                       4.0000   F-statistic (robust):             0.1784                                         P-value                           0.6739 Time periods:                       4   Distribution:                    F(1,77) Avg Obs:                       26.000                                            Min Obs:                       26.000                                            Max Obs:                       26.000                                                                          Parameter Estimates                               ==============================================================================             Parameter  Std. Err.     T-stat    P-value    Lower CI    Upper CI ------------------------------------------------------------------------------ x              0.0573     0.1356     0.4224     0.6739     -0.2127      0.3273 ==============================================================================  F-test for Poolability: 1.0903 P-value: 0.3739 Distribution: F(25,77)  Included effects: Entity

It also has a formula interface which is similar to statsmodels,

mod = PanelOLS.from_formula('y ~ x + EntityEffects', df)

answered Oct 20 '22 23:10

Kevin S

Related questions
                            
                                How to reopen Console Output in a Swift Playground?
                            
                                Where are functions of an object stored in memory?
                            
                                How do you select all records from a mongodb collection in golang using mgo
                            
                                Git Sparse Checkout Leaves No Entry on Working Directory
                            
                                How to install ngSanitize?
                            
                                c++11 constexpr flatten list of std::array into array
                            
                                IntelliJ Idea IDE using port 1099
                            
                                error while loading shared libraries libpng16
                            
                                How to compress files into zip folder in android? [duplicate]
                            
                                Where is "PFFacebookUtils.framework" on CocoaPods repo?
                            
                                "Add semicolon and goto new line" hotkey
                            
                                What's the difference between onComplete and flatMap of Future?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Fixed effect in Pandas or Statsmodels

Tags:

user3576212

People also ask

Video Answer

2 Answers

Karl D.

Kevin S

Recent Activity

Donate For Us