I'm using R to run a Monte-Carlo simulation studying the performance of panel data estimators. Because I'll be running a large number of trials, I need to get at least decent performance from my code.
Using Rprof
on 10 trials of my simulation shows that a significant portion of time is spent in calls to summary.plm
. The first few lines of Rprofsummary
are provided below:
$by.total
total.time total.pct self.time self.pct
"trial" 54.48 100.0 0.00 0.0
"coefs" 53.90 98.9 0.06 0.1
"model.matrix" 36.72 67.4 0.10 0.2
"model.matrix.pFormula" 35.98 66.0 0.06 0.1
"summary" 33.82 62.1 0.00 0.0
"summary.plm" 33.80 62.0 0.08 0.1
"r.squared" 29.00 53.2 0.02 0.0
"FUN" 24.84 45.6 7.52 13.8
I'm calling summary
in my code because I need to get the standard errors of the coefficient estimates as well as the coefficients themselves (which I could get from just the plm object). My call looks like
regression <- plm(g ~ y0 + Xit, data=panel_data, model=model, index=c("country","period"))
coefficients_estimated <- summary(regression)$coefficients[,"Estimate"]
ses_estimated <- summary(regression)$coefficients[,"Std. Error"]
I have a nagging feeling that this is a huge waste of cpu time, but I don't know enough about how R does things to avoid calling summary. I'd appreciate any information on what's going on behind the scenes here, or some way of reducing the time it takes for this to excecute.
You just need to look inside plm:::summary.plm
to see what it is doing. When you do, you'll see that your two lines calling summary()
on your model fit can be replaced with:
coefficients_estimated <- coef(regression)
ses_estimated <- sqrt(diag(vcov(regression)))
For example:
require(plm)
data("Produc", package = "plm")
zz <- plm(log(gsp) ~ log(pcap) + log(pc) + log(emp) + unemp,
data = Produc, index = c("state","year"))
summary(zz)
gives:
> summary(zz)
Oneway (individual) effect Within Model
....
Coefficients :
Estimate Std. Error t-value Pr(>|t|)
log(pcap) -0.02614965 0.02900158 -0.9017 0.3675
log(pc) 0.29200693 0.02511967 11.6246 < 2.2e-16 ***
log(emp) 0.76815947 0.03009174 25.5273 < 2.2e-16 ***
unemp -0.00529774 0.00098873 -5.3582 1.114e-07 ***
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
....
and the two lines I showed return for zz
:
> coef(zz)
log(pcap) log(pc) log(emp) unemp
-0.026149654 0.292006925 0.768159473 -0.005297741
> sqrt(diag(vcov(zz)))
log(pcap) log(pc) log(emp) unemp
0.0290015755 0.0251196728 0.0300917394 0.0009887257
You don't really provide enough information (your simulation code nor the full output from Rprof()
for example) to say whether this will help - it certainly doesn't look like vast amounts of time are spent in summary()
; FUN
is far more costly than anything else you show, and of the elements you do show, r.squared()
is the only one that appears in plm:::summary.plm()
and it seems to take no time at all.
So, whether the above speeds things up appreciably remains to be seen.
If you want to take things further, then have a look at the actual function code of plm:::plm
You will notice that there is a lot of argument checking, before a final call to plm:::plm.fit
You could (if really wanted), skip straight to plm.fit
.
One final point. You mention that your problem is a Monte Carlo simulation. Can you leverage parallel computing for your speed increases?
Just use coeftest(zz)
. coeftest
is in the lmtest
package; it will give you the coefficients and standard errors from plm
objects much more quickly than summary.plm
.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With