Is there an easy way to do a fixed-effects regression in R when the number of dummy variables leads to a model matrix that exceeds the R maximum vector length? E.g.,
> m <- lm(log(bid) ~ after + I(after*score) + id, data = data)
Error in model.matrix.default(mt, mf, contrasts) :
cannot allocate vector of length 905986769
where id is a factor (and is the variable causing the problem above).
I know that I could go through and de-mean all the data, but this throws the standard errors off (yes, you could compute the SE's "by hand" w/ a df adjustment but I'd like to minimize the probability that I'm introducing new errors). I've looked at the plm package but it seems only designed for classical panel data w/ a time component, which is not the structure of my data.
where regression is your stored fixed effects results from plm. And is (Dummy) now seen as a dummy variable ? Yes. Please note that Factor (Dummy) will return an error. Instead, use factor (dummy). Also, if Dummy is truly a 0/1 variable, then wrapping it in factor () is unnecessary; simply use Dummy as is inside of the model formula.
Are the estimated dummy variables the fixed effect, or do they simply absorb the fixed effect (and other variables invariant across the other dimensions of the data)? To be clear, estimating your equation via least squares dummy variables (LSDV) is algebraically equivalent to estimation in deviations from means.
First, it's clear from the first specification above that an FE regression model can be implemented in with R's OLS regression function, lm (), simply by fitting an intercept for each level of a factor that indexes each subject in the data.
The fixed effects model can be generalized to contain more than just one determinant of Y Y that is correlated with X X and changes over time. Key Concept 10.2 presents the generalized fixed effects regression model. with i = 1,…,n i = 1, …, n and t = 1,…,T t = 1, …, T.
Plm will work fine for this sort of data. The time component is not required.
> library(plm)
> data("Produc", package="plm")
> zz <- plm(log(gsp)~log(pcap)+log(pc)+log(emp)+unemp, data=Produc, index=c("state"))
> zz2 <- lm(log(gsp)~log(pcap)+log(pc)+log(emp)+unemp+factor(state), data=Produc)
> summary(zz)$coefficients[,1:3]
Estimate Std. Error t-value
log(pcap) -0.026149654 0.0290015755 -0.9016632
log(pc) 0.292006925 0.0251196728 11.6246309
log(emp) 0.768159473 0.0300917394 25.5272539
unemp -0.005297741 0.0009887257 -5.3581508
> summary(zz2)$coefficients[1:5,1:3]
Estimate Std. Error t value
(Intercept) 2.201617056 0.1760038727 12.5089126
log(pcap) -0.026149654 0.0290015755 -0.9016632
log(pc) 0.292006925 0.0251196728 11.6246309
log(emp) 0.768159473 0.0300917394 25.5272539
unemp -0.005297741 0.0009887257 -5.3581508
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With