Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Using clustered covariance matrix in predict.lm()

Tags:

r

I am analyzing a dataset in which data is clustered in several groups (towns in regions). The dataset looks like:

R> df <- data.frame(x = rnorm(10), 
                     y = 3*rnorm(x), 
                     groups = factor(sample(c('0','1'), 10, TRUE)))
R> head(df)
        x     y groups
1 -0.8959  1.54      1
2 -0.1008 -2.73      1
3  0.4406  0.44      0
4  0.0683  1.62      1
5 -0.0037 -0.20      1
6 -0.8966 -2.34      0

I want my lm() estimates to account for intraclass correlation in groups and for that purpose I am using a function cl() that takes an lm() and returns the robust clustered covariance matrix (original here):

cl  <- function(fm, cluster) {
  library(sandwich)
  M <- length(unique(cluster))   
  N <- length(cluster)              
  K <- fm$rank                   
  dfc <- (M/(M-1))*((N-1)/(N-K-1))
  uj  <- apply(estfun(fm), 2, function(x) tapply(x, cluster, sum));
  vcovCL <- dfc * sandwich(fm, meat = crossprod(uj)/N)
  return(vcovCL)
}

Now,

output <- lm(y ~ x, data = df)
clcov <- cl(output, df$groups)
coeftest(output, clcov, nrow(df) - 1)

gives me the estimates I need. The problem now is that I want to use the model for prediction, and I need the standard error of the prediction to be calculated with the new covariance matrix clcov. That is, I need

predict(output, se.fit = TRUE)

but using clcov instead of vcov(output). Something like a vcov() <- would be perfect.

Of course, I could write my own function to do predictions, but I am just wondering whether there is a more practical method that allows me to use methods for signature lm (like arm::sim).

like image 390
griverorz Avatar asked Feb 04 '23 02:02

griverorz


2 Answers

The se.fit in predict is not calculated using the vcov matrix, but using the qr decomposition and the residual variance. This goes for the vcov() function as well: it takes the unscaled cov matrix from the summary.lm() together with the residual variance, and uses those ones. And the unscaled cov matrix is - again- calculated from the QR decomposition.

So I'm afraid the answer is "no, there is no other option than to write your own function". You can't really set the vcov matrix, as it is recalculated when needed. Yet, writing your own function is rather trivial.

predict.rob <- function(x,clcov,newdata){
    if(missing(newdata)){ newdata <- x$model }
    m.mat <- model.matrix(x$terms,data=newdata)
    m.coef <- x$coef
    fit <- as.vector(m.mat %*% x$coef)
    se.fit <- sqrt(diag(m.mat%*%clcov%*%t(m.mat)))
    return(list(fit=fit,se.fit=se.fit))
}

I didn't use the predict() function to avoid unnecessary calculations. It wouldn't shorten the code too much anyway.


On a side note, questions like this are better asked on stats.stackexchange.com

like image 68
Joris Meys Avatar answered Feb 05 '23 15:02

Joris Meys


I modified the above code slightly to be more consistent with the predict function--this way you are not expected to enter values for the outcome in the newdata dataframe

predict.rob <- function(x,clcov,newdata){
if(missing(newdata)){ newdata <- x$model }
tt <- terms(x)
Terms <- delete.response(tt)
m.mat <- model.matrix(Terms,data=newdata)
m.coef <- x$coef
fit <- as.vector(m.mat %*% x$coef)
se.fit <- sqrt(diag(m.mat%*%clcov%*%t(m.mat)))
return(list(fit=fit,se.fit=se.fit))}
like image 40
Michael Avatar answered Feb 05 '23 14:02

Michael