Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Question about p-values with clustered standard errors in LFE package in R

Tags:

r

lfe

I am estimating a model with fixed effects and clustered standard errors using the lfe-package.

As it turns out, I have a huge t-value (23.317) but only a comparatively small p-value (0.0273). This seems to have something to do with me using the projecting out of fixed effects. When I estimate the fixed effects manually as control variables, my p-value is too small to be reported <2e-16 .

Consider the following working example (I am sorry if it's more complicated than strictly necessary, I am trying to be close to my application):

I am simply estimating a pooled panel estimator of 10 time series over 50 periods. And I assume that there are two clusters in the time series.

library(data.table)
library(lfe)


x <- rnorm(50, mean = 1, sd = 1)
common_shock <- rnorm(50, mean = 0, sd = 1)


y1 = 0.5 + 5*x + rnorm(50, mean = 0, sd = 2) + common_shock
y2 = 0.5 + 5*x + rnorm(50, mean = 0, sd = 2) + common_shock
y3 = 0.5 + 5*x + rnorm(50, mean = 0, sd = 2) + common_shock
y4 = 0.5+ 5*x + rnorm(50, mean = 0, sd = 2) + common_shock
y5 = 0.5+ 5*x + rnorm(50, mean = 0, sd = 2) + common_shock
y6 = x + rnorm(50, mean = 0, sd = 2)
y7 = x + rnorm(50, mean = 0, sd = 2)
y8 = x + rnorm(50, mean = 0, sd = 2)
y9 = x + rnorm(50, mean = 0, sd = 2)

y10 = x + rnorm(50, mean = 0, sd = 2)

DT <- data.table(periods = 1:50, y1, y2, y3, y4, y5, y6, y7, y8, y9, y10)
Controls <- data.table(periods = 1:50, x)
indicators <- data.table(y_label = paste0("y", 1:10),
                         indicator = c(1, 1, 1, 1, 1, 0, 0, 0, 0, 0))

DT <- melt(DT, id.vars= c("periods"))

DT <- merge(DT, Controls, by="periods", all = TRUE)
DT <- merge(DT, indicators, by.x="variable", by.y="y_label", all = TRUE)


results <- felm(as.formula("value ~ -1 + indicator + x:indicator  | periods | 0 | periods + indicator"), data = DT)
results2 <- felm(as.formula("value ~ -1 + indicator + x:indicator + as.factor(periods) | 0 | 0 | periods + indicator"), data = DT)
summary(results)
summary(results2)

The first results gives me

indicator:x 3.8625 0.1657 23.317 0.0273 *

The second results2 gives me

indicator:x 3.86252 0.20133 19.185 < 2e-16 ***

So it must be related to the projecting out of fixed effects, but this difference is so huge, that I would like to know a bit more about it. Does someone know what the underlying issue is here?

like image 515
Florestan Avatar asked Oct 16 '22 04:10

Florestan


1 Answers

You're attempting to adjust your standard errors for clustering on "indicator" which is binary.

table(DT$indicator)
#   0   1 
# 250 250 

In other words, you only have two clusters. Your first "results" seem to be "correct", since they give correctly 1 as degrees of freedom.

(df1 <- results$df)
# [1] 1

Whereas "results2" has 448 degrees of freedom.

(df2 <- results2$df)
# [1] 448

When we calculate the p-values per hand, we may replicate your first result using one degree of freedom (as it should be with only two clusters), your second one with 448 degrees of freedom.

PV <- function(x, df) 2 * pt(-abs(x), df=df)

r1 <- summary(results)$coe
t1 <- r1[grep("indicator:x", rownames(r1)), "t value"]
PV(t1, df1)
# [1] 0.02937402

r2 <- summary(results2)$coe
t2 <- r2[grep("indicator:x", rownames(r2)), "t value"]
PV(t2, df2)
# [1] 2.371641e-55

It seems that felm can't deal with a factor variable as fixed effects, since its standard notation is y ~ x1 + x2 | f1 + f2 | (Q|W ~ x3+x4) | clu1 + clu2.

Note, that your first result is not "correct" even when it's adjusted to the degrees of freedom. Just two clusters don't make much sense to me, perhaps you may want to overthink your model. Regardless, if you have fewer than ~50 clusters, you should use something like the wild cluster bootstrap method (see Cameron and Miller, 2015).


Data:

I used your data with set.seed(42).

like image 112
jay.sf Avatar answered Oct 18 '22 16:10

jay.sf