How can I calculate the heteroskedastic consistent covariance matrix (HCCM) for a univariate regression (i.e., an OLS regression with one regressor and no intercept)?
Back story: I have a fixed effect panel regression, but with only one regressor. So I can do it with either the least squares dummy variable (LSDV) approach
lm.ser.yr <- lm(realwage ~ placebo.ser + factor(incyear) + 0, data = cps)
or I can demean the LHS at the year level and regress
cps <- ddply(cps, .(incyear), transform, realwage.dm.yr = realwage - mean(realwage))
lm.ser.yr <- lm(realwage.dm.yr ~ placebo.ser + 0, data = cps)
But there are problems with both.
With the LSDV approach and enough dummies the lm object gets huge (1 gb each). This is mostly because of the large qr matrix that I have to keep to pass it to vcovHC() to calculate the HCCM, which is frequently stops on "can't allocate vector of size" errors (or does a lot of paging).
With the demeaned within estimator everything works great, but I can't calculate the HCCM because neither vcovHC() or hccm() have handle the no intercept univariate lm object (i.e., it spits back NA, which as best I can tell is because without the intercept and dummy variables, my residuals are much farther from zero mean).
Is there a solution to this short of very aggressively managing memory and/or moving to the cloud?
vcovHC() from the plm package seems to handle lm() models without intercepts perfectly fine:
library(plm)
df <- data.frame('a'=rnorm(1000), 'b'=rnorm(1000))
mod <- lm(a ~ b -1, df)
vcovHC(mod)
Edit: here's some code to calculate them manually.
library(Matrix)
e2 = mod$residuals ^ 2
X = model.matrix(mod)
N = nrow(X)
bread = solve(crossprod(X))
I <- Matrix(data=0, ncol=N, nrow=N, sparse=TRUE)
diag(I) <- e2
salami <- t(X) %*% I %*% X
V = bread %*% salami %*% bread
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With