I have a balanced panel data set, df, that essentially consists in three variables, A, B and Y, that vary over time for a bunch of uniquely identified regions. I would like to run a regression that includes both regional (region in the equation below) and time (year) fixed effects. If I'm not mistaken, I can achieve this in different ways: <pre class="prettyprint"><code>lm(Y ~ A + B + factor(region) + factor(year), data = df) </code></pre> or <pre class="prettyprint"><code>library(plm) plm(Y ~ A + B, data = df, index = c('region', 'year'), model = 'within', effect = 'twoways') </code></pre> In the second equation I specify indices (region and year), the model type ('within', FE), and the nature of FE ('twoways', meaning that I'm including both region and time FE). Despite I seem to be doing things correctly, I get extremely different results. The problem disappears when I do not consider time fixed effects - and use the argument effect = 'individual'. What's the deal here? Am I missing something? Are there any other R packages that allow to run the same analysis?

Perhaps posting an example of your data would help answer the question. I am getting the same coefficients for some made up data. You can also use <code>felm</code> from the package <code>lfe</code> to do the same thing: <pre class="prettyprint"><code>N <- 10000 df <- data.frame(a = rnorm(N), b = rnorm(N), region = rep(1:100, each = 100), year = rep(1:100, 100)) df$y <- 2 * df$a - 1.5 * df$b + rnorm(N) model.a <- lm(y ~ a + b + factor(year) + factor(region), data = df) summary(model.a) # (Intercept) -0.0522691 0.1422052 -0.368 0.7132 # a 1.9982165 0.0101501 196.866 <2e-16 *** # b -1.4787359 0.0101666 -145.450 <2e-16 *** library(plm) pdf <- pdata.frame(df, index = c("region", "year")) model.b <- plm(y ~ a + b, data = pdf, model = "within", effect = "twoways") summary(model.b) # Coefficients : # Estimate Std. Error t-value Pr(>|t|) # a 1.998217 0.010150 196.87 < 2.2e-16 *** # b -1.478736 0.010167 -145.45 < 2.2e-16 *** library(lfe) model.c <- felm(y ~ a + b | factor(region) + factor(year), data = df) summary(model.c) # Coefficients: # Estimate Std. Error t value Pr(>|t|) # a 1.99822 0.01015 196.9 <2e-16 *** # b -1.47874 0.01017 -145.4 <2e-16 *** </code></pre>

R - Plm and lm - Fixed effects

Tags:

r

regression

plm

I have a balanced panel data set, df, that essentially consists in three variables, A, B and Y, that vary over time for a bunch of uniquely identified regions. I would like to run a regression that includes both regional (region in the equation below) and time (year) fixed effects. If I'm not mistaken, I can achieve this in different ways:

lm(Y ~ A + B + factor(region) + factor(year), data = df)

library(plm)
plm(Y ~ A + B, 
    data = df, index = c('region', 'year'), model = 'within',
    effect = 'twoways')

In the second equation I specify indices (region and year), the model type ('within', FE), and the nature of FE ('twoways', meaning that I'm including both region and time FE).

Despite I seem to be doing things correctly, I get extremely different results. The problem disappears when I do not consider time fixed effects - and use the argument effect = 'individual'. What's the deal here? Am I missing something? Are there any other R packages that allow to run the same analysis?

749

asked Apr 26 '17 14:04

Jasper

1 Answers

Perhaps posting an example of your data would help answer the question. I am getting the same coefficients for some made up data. You can also use felm from the package lfe to do the same thing:

N <- 10000
df <- data.frame(a = rnorm(N), b = rnorm(N),
                 region = rep(1:100, each = 100), year = rep(1:100, 100))
df$y <- 2 * df$a - 1.5 * df$b + rnorm(N)


model.a <- lm(y ~ a + b + factor(year) + factor(region), data = df)
summary(model.a)
#  (Intercept)       -0.0522691  0.1422052   -0.368   0.7132    
#  a                  1.9982165  0.0101501  196.866   <2e-16 ***
#  b                 -1.4787359  0.0101666 -145.450   <2e-16 ***

library(plm)
pdf <- pdata.frame(df, index = c("region", "year"))

model.b <- plm(y ~ a + b, data = pdf, model = "within", effect = "twoways")
summary(model.b)

# Coefficients :
#    Estimate Std. Error t-value  Pr(>|t|)    
# a  1.998217   0.010150  196.87 < 2.2e-16 ***
# b -1.478736   0.010167 -145.45 < 2.2e-16 ***

library(lfe)

model.c <- felm(y ~ a + b | factor(region) + factor(year), data = df)
summary(model.c)

# Coefficients:
#   Estimate Std. Error t value Pr(>|t|)    
# a  1.99822    0.01015   196.9   <2e-16 ***
# b -1.47874    0.01017  -145.4   <2e-16 ***

answered Sep 18 '22 14:09

maccruiskeen

Related questions
                            
                                Parallel wilcox.test using group_by and summarise
                            
                                Mutate data conditionally in dplyr
                            
                                Find dates that fail to parse in R Lubridate
                            
                                Incorporating time series into a mixed effects model in R (using lme4)
                            
                                Is it possible to include custom css in htmlwidgets for R and/or LeafletR?
                            
                                gganimate issue with geom_bar?
                            
                                Creating indicator variable columns in dplyr chain
                            
                                how to plot a figure with specific distance between each line
                            
                                How to increase the font size of label names
                            
                                Counting unique days with overlap and gaps in date ranges
                            
                                R: dplyr group by date range
                            
                                Best practices to alert users of package vignettes when `library(packagename)` is loaded? [closed]
                            
                                Ctrl + Shift + D doesn't run documentation routines
                            
                                Edit labels in tooltip for plotly maps using ggplot2 in r
                            
                                How does plot.lm() determine outliers for residual vs fitted plot?
                            
                                Python equivalent to R poly() function?
                            
                                ggplot, drawing multiple lines across facets
                            
                                Apply tidyr::separate over multiple columns
                            
                                Select nth observation and sum by group using data.table
                            
                                "object 'day' not found r". But 'day' is a column name [closed]

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With