Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Fixed effects regression in R (with a very large number of dummy variables)

Tags:

r

plm

Is there an easy way to do a fixed-effects regression in R when the number of dummy variables leads to a model matrix that exceeds the R maximum vector length? E.g.,

> m <- lm(log(bid) ~ after + I(after*score) + id, data = data)
Error in model.matrix.default(mt, mf, contrasts) : 
cannot allocate vector of length 905986769

where id is a factor (and is the variable causing the problem above).

I know that I could go through and de-mean all the data, but this throws the standard errors off (yes, you could compute the SE's "by hand" w/ a df adjustment but I'd like to minimize the probability that I'm introducing new errors). I've looked at the plm package but it seems only designed for classical panel data w/ a time component, which is not the structure of my data.

like image 471
John Horton Avatar asked Mar 01 '10 12:03

John Horton


People also ask

Is (dummy) now seen as a dummy variable in regression?

where regression is your stored fixed effects results from plm. And is (Dummy) now seen as a dummy variable ? Yes. Please note that Factor (Dummy) will return an error. Instead, use factor (dummy). Also, if Dummy is truly a 0/1 variable, then wrapping it in factor () is unnecessary; simply use Dummy as is inside of the model formula.

Are the estimated dummy variables the fixed effect?

Are the estimated dummy variables the fixed effect, or do they simply absorb the fixed effect (and other variables invariant across the other dimensions of the data)? To be clear, estimating your equation via least squares dummy variables (LSDV) is algebraically equivalent to estimation in deviations from means.

How can I implement an Fe regression model in R?

First, it's clear from the first specification above that an FE regression model can be implemented in with R's OLS regression function, lm (), simply by fitting an intercept for each level of a factor that indexes each subject in the data.

What is a generalized fixed effects regression model?

The fixed effects model can be generalized to contain more than just one determinant of Y Y that is correlated with X X and changes over time. Key Concept 10.2 presents the generalized fixed effects regression model. with i = 1,…,n i = 1, …, n and t = 1,…,T t = 1, …, T.


1 Answers

Plm will work fine for this sort of data. The time component is not required.

> library(plm)
> data("Produc", package="plm")
> zz <- plm(log(gsp)~log(pcap)+log(pc)+log(emp)+unemp, data=Produc, index=c("state"))
> zz2 <- lm(log(gsp)~log(pcap)+log(pc)+log(emp)+unemp+factor(state), data=Produc)
> summary(zz)$coefficients[,1:3]
              Estimate   Std. Error    t-value
log(pcap) -0.026149654 0.0290015755 -0.9016632
log(pc)    0.292006925 0.0251196728 11.6246309
log(emp)   0.768159473 0.0300917394 25.5272539
unemp     -0.005297741 0.0009887257 -5.3581508
> summary(zz2)$coefficients[1:5,1:3]
                Estimate   Std. Error    t value
(Intercept)  2.201617056 0.1760038727 12.5089126
log(pcap)   -0.026149654 0.0290015755 -0.9016632
log(pc)      0.292006925 0.0251196728 11.6246309
log(emp)     0.768159473 0.0300917394 25.5272539
unemp       -0.005297741 0.0009887257 -5.3581508
like image 171
Eduardo Leoni Avatar answered Nov 13 '22 21:11

Eduardo Leoni