Fixed effects regression in R (with a very large number of dummy variables)

Tags:

plm

Is there an easy way to do a fixed-effects regression in R when the number of dummy variables leads to a model matrix that exceeds the R maximum vector length? E.g.,

> m <- lm(log(bid) ~ after + I(after*score) + id, data = data)
Error in model.matrix.default(mt, mf, contrasts) : 
cannot allocate vector of length 905986769

where id is a factor (and is the variable causing the problem above).

I know that I could go through and de-mean all the data, but this throws the standard errors off (yes, you could compute the SE's "by hand" w/ a df adjustment but I'd like to minimize the probability that I'm introducing new errors). I've looked at the plm package but it seems only designed for classical panel data w/ a time component, which is not the structure of my data.

471

asked Mar 01 '10 12:03

John Horton

1 Answers

Plm will work fine for this sort of data. The time component is not required.

> library(plm)
> data("Produc", package="plm")
> zz <- plm(log(gsp)~log(pcap)+log(pc)+log(emp)+unemp, data=Produc, index=c("state"))
> zz2 <- lm(log(gsp)~log(pcap)+log(pc)+log(emp)+unemp+factor(state), data=Produc)
> summary(zz)$coefficients[,1:3]
              Estimate   Std. Error    t-value
log(pcap) -0.026149654 0.0290015755 -0.9016632
log(pc)    0.292006925 0.0251196728 11.6246309
log(emp)   0.768159473 0.0300917394 25.5272539
unemp     -0.005297741 0.0009887257 -5.3581508
> summary(zz2)$coefficients[1:5,1:3]
                Estimate   Std. Error    t value
(Intercept)  2.201617056 0.1760038727 12.5089126
log(pcap)   -0.026149654 0.0290015755 -0.9016632
log(pc)      0.292006925 0.0251196728 11.6246309
log(emp)     0.768159473 0.0300917394 25.5272539
unemp       -0.005297741 0.0009887257 -5.3581508

171

answered Nov 13 '22 21:11

Eduardo Leoni

Related questions
                            
                                Replace values in tibble in R 4.0
                            
                                Chunk continuous timeseries data into non-continuous time windows for multiple time periods and multiple groups
                            
                                "Error: Must subset rows with a valid subscript vector" in preProcess() when using knnImpute
                            
                                Unexpected return for NA in factor lookup
                            
                                Apply a Bayesian model (JAGS) for various iterations
                            
                                R mutate multiple columns with ifelse
                            
                                How to make continuous a discontinuous sequence of character numbers with leading zero(s)?
                            
                                creating Rd documentation files for R6 classes not in a package
                            
                                Replacing diagonal elements using dplyr pipe
                            
                                (tidyverse approach) calculating rowsum across several columns where info on columns to include comes from a different data frame
                            
                                Providing the correct match for a list in a list of lists in R
                            
                                How to write an efficient wrapper for data wrangling, allowing to turn off any wrapped part when calling the wrapper
                            
                                Very simple yet confusing R question about bind_rows()
                            
                                Can R Markdown be made to preview from anywhere other than the start of the document?
                            
                                expand_grid with identical vectors
                            
                                Transforming complete age from character to numeric in R
                            
                                !!! (splice operator) for ggplot2 geom_point() function
                            
                                Using summarize(across(..., .fns = ...)) with a multi-variate function
                            
                                Add a column based on all other columns in DB
                            
                                Most efficient way to sort two vectors in lockstep in R?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With