I am analyzing a dataset in which data is clustered in several groups (towns in regions). The dataset looks like:
R> df <- data.frame(x = rnorm(10),
y = 3*rnorm(x),
groups = factor(sample(c('0','1'), 10, TRUE)))
R> head(df)
x y groups
1 -0.8959 1.54 1
2 -0.1008 -2.73 1
3 0.4406 0.44 0
4 0.0683 1.62 1
5 -0.0037 -0.20 1
6 -0.8966 -2.34 0
I want my lm() estimates to account for intraclass correlation in groups and for that purpose I am using a function cl()
that takes an lm()
and returns the robust clustered covariance matrix (original here):
cl <- function(fm, cluster) {
library(sandwich)
M <- length(unique(cluster))
N <- length(cluster)
K <- fm$rank
dfc <- (M/(M-1))*((N-1)/(N-K-1))
uj <- apply(estfun(fm), 2, function(x) tapply(x, cluster, sum));
vcovCL <- dfc * sandwich(fm, meat = crossprod(uj)/N)
return(vcovCL)
}
Now,
output <- lm(y ~ x, data = df)
clcov <- cl(output, df$groups)
coeftest(output, clcov, nrow(df) - 1)
gives me the estimates I need. The problem now is that I want to use the model for prediction, and I need the standard error of the prediction to be calculated with the new covariance matrix clcov
. That is, I need
predict(output, se.fit = TRUE)
but using clcov
instead of vcov(output)
. Something like a vcov() <-
would be perfect.
Of course, I could write my own function to do predictions, but I am just wondering whether there is a more practical method that allows me to use methods for signature lm
(like arm::sim).
The se.fit in predict is not calculated using the vcov matrix, but using the qr decomposition and the residual variance. This goes for the vcov() function as well: it takes the unscaled cov matrix from the summary.lm() together with the residual variance, and uses those ones. And the unscaled cov matrix is - again- calculated from the QR decomposition.
So I'm afraid the answer is "no, there is no other option than to write your own function". You can't really set the vcov matrix, as it is recalculated when needed. Yet, writing your own function is rather trivial.
predict.rob <- function(x,clcov,newdata){
if(missing(newdata)){ newdata <- x$model }
m.mat <- model.matrix(x$terms,data=newdata)
m.coef <- x$coef
fit <- as.vector(m.mat %*% x$coef)
se.fit <- sqrt(diag(m.mat%*%clcov%*%t(m.mat)))
return(list(fit=fit,se.fit=se.fit))
}
I didn't use the predict() function to avoid unnecessary calculations. It wouldn't shorten the code too much anyway.
On a side note, questions like this are better asked on stats.stackexchange.com
I modified the above code slightly to be more consistent with the predict function--this way you are not expected to enter values for the outcome in the newdata dataframe
predict.rob <- function(x,clcov,newdata){
if(missing(newdata)){ newdata <- x$model }
tt <- terms(x)
Terms <- delete.response(tt)
m.mat <- model.matrix(Terms,data=newdata)
m.coef <- x$coef
fit <- as.vector(m.mat %*% x$coef)
se.fit <- sqrt(diag(m.mat%*%clcov%*%t(m.mat)))
return(list(fit=fit,se.fit=se.fit))}
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With