I have a balanced panel data set, df, that essentially consists in three variables, A, B and Y, that vary over time for a bunch of uniquely identified regions. I would like to run a regression that includes both regional (region in the equation below) and time (year) fixed effects. If I'm not mistaken, I can achieve this in different ways:
lm(Y ~ A + B + factor(region) + factor(year), data = df)
or
library(plm)
plm(Y ~ A + B,
data = df, index = c('region', 'year'), model = 'within',
effect = 'twoways')
In the second equation I specify indices (region and year), the model type ('within', FE), and the nature of FE ('twoways', meaning that I'm including both region and time FE).
Despite I seem to be doing things correctly, I get extremely different results. The problem disappears when I do not consider time fixed effects - and use the argument effect = 'individual'. What's the deal here? Am I missing something? Are there any other R packages that allow to run the same analysis?
plm is a package for R which intends to make the estimation of linear panel models straightforward. plm provides functions to estimate a wide variety of models and to make (robust) inference. Details For a gentle and comprehensive introduction to the package, please see the package's vignette.
A fixed effect model is an OLS model including a set of dummy variables for each group in your dataset. In our case, we need to include 3 dummy variable - one for each country. The model automatically excludes one to avoid multicollinearity problems.
Fixed effects is a statistical regression model in which the intercept of the regression model is allowed to vary freely across individuals or groups. It is often applied to panel data in order to control for any individual-specific attributes that do not vary across time.
In research, one way to control for differences between subjects (i.e. to “fix” the effects) is to randomly assign the participants to treatment groups and control groups. For example, one difference could be age, but by randomly assigning participants you control for age across groups.
Perhaps posting an example of your data would help answer the question. I am getting the same coefficients for some made up data. You can also use felm
from the package lfe
to do the same thing:
N <- 10000
df <- data.frame(a = rnorm(N), b = rnorm(N),
region = rep(1:100, each = 100), year = rep(1:100, 100))
df$y <- 2 * df$a - 1.5 * df$b + rnorm(N)
model.a <- lm(y ~ a + b + factor(year) + factor(region), data = df)
summary(model.a)
# (Intercept) -0.0522691 0.1422052 -0.368 0.7132
# a 1.9982165 0.0101501 196.866 <2e-16 ***
# b -1.4787359 0.0101666 -145.450 <2e-16 ***
library(plm)
pdf <- pdata.frame(df, index = c("region", "year"))
model.b <- plm(y ~ a + b, data = pdf, model = "within", effect = "twoways")
summary(model.b)
# Coefficients :
# Estimate Std. Error t-value Pr(>|t|)
# a 1.998217 0.010150 196.87 < 2.2e-16 ***
# b -1.478736 0.010167 -145.45 < 2.2e-16 ***
library(lfe)
model.c <- felm(y ~ a + b | factor(region) + factor(year), data = df)
summary(model.c)
# Coefficients:
# Estimate Std. Error t value Pr(>|t|)
# a 1.99822 0.01015 196.9 <2e-16 ***
# b -1.47874 0.01017 -145.4 <2e-16 ***
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With