Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Panel data regression: Robust standard errors

my problem is this: I get NA where I should get some values in the computation of robust standard errors.

I am trying to do a fixed effect panel regression with cluster-robust standard errors. For this, I follow Arai (2011) who on p. 3 follows Stock/ Watson (2006) (later published in Econometrica, for those who have access). I would like to correct the degrees of freedom by (M/(M-1)*(N-1)/(N-K) against downward bias as my number of clusters is finite and I have unbalanced data.

Similar problems have been posted before [1, 2] on StackOverflow and related problems [3] on CrossValidated.

Arai (and the answer in the 1st link) uses the following code for functions (I provide my data below with some further comment):

gcenter <- function(df1,group) {
    variables <- paste(
        rep("C", ncol(df1)), colnames(df1), sep=".")
    copydf <- df1
    for (i in 1:ncol(df1)) {
        copydf[,i] <- df1[,i] - ave(df1[,i], group,FUN=mean)}
    colnames(copydf) <- variables
    return(cbind(df1,copydf))}

# 1-way adjusting for clusters
clx <- function(fm, dfcw, cluster){
    # R-codes (www.r-project.org) for computing
    # clustered-standard errors. Mahmood Arai, Jan 26, 2008.
    # The arguments of the function are:
    # fitted model, cluster1 and cluster2
    # You need to install libraries `sandwich' and `lmtest'
    # reweighting the var-cov matrix for the within model
    library(sandwich);library(lmtest)
    M <- length(unique(cluster))   
    N <- length(cluster)           
    K <- fm$rank                        
    dfc <- (M/(M-1))*((N-1)/(N-K))  
    uj  <- apply(estfun(fm),2, function(x) tapply(x, cluster, sum));
    vcovCL <- dfc*sandwich(fm, meat=crossprod(uj)/N)*dfcw
    coeftest(fm, vcovCL) }

,where the gcenter computes deviations from the mean (fixed effect). I then continue and do the regression with DS_CODEbeing my cluster variable (I have named my data 'data').

centerdata <- gcenter(data, data$DS_CODE)
datalm <- lm(C.L1.retE1M ~ C.MCAP_SEC + C.Impact_change + C.Mom + C.BM + C.PD + C.CashGen + C.NITA + C.PE + C.PEdummy + factor(DS_CODE), data=centerdata)
M <- length(unique(data$DS_CODE))
dfcw <- datalm$df / (datalm$df - (M-1))

and want to calculate

clx(datalm, dfcw, data$DS_CODE)

However, when I want to compute uj (see formula clx above) for the variance, I get only at the beginning some values for my regressors, then lots of zeros. If this input uj is used for the variance, only NAs result.

My data

Since my data may be of special structure and I can't figure out the problem, I post the entire thing as a link from Hotmail. The reason is that with other data (taken from Arai (2011)) my problem does not occur. Sorry in advance for the mess but I'd be very grateful if you could have a look at it nevertheless. The file is a 5mb .txt file containing purely data.

like image 406
Jan Avatar asked May 21 '12 19:05

Jan


1 Answers

After some time playing around, it works for me and gives me:

                         Estimate  Std. Error t value  Pr(>|t|)    
(Intercept)            4.5099e-16  5.2381e-16  0.8610  0.389254    
C.MCAP_SEC            -5.9769e-07  1.2677e-07 -4.7149 2.425e-06 ***
C.Impact_change       -5.3908e-04  7.5601e-05 -7.1306 1.014e-12 ***
C.Mom                  3.7560e-04  3.3378e-03  0.1125  0.910406    
C.BM                  -1.6438e-04  1.7368e-05 -9.4645 < 2.2e-16 ***
C.PD                   6.2153e-02  3.8766e-02  1.6033  0.108885    
C.CashGen             -2.7876e-04  1.4031e-02 -0.0199  0.984149    
C.NITA                -8.1792e-02  3.2153e-02 -2.5438  0.010969 *  
C.PE                  -6.6170e-06  4.0138e-06 -1.6485  0.099248 .  
C.PEdummy              1.3143e-02  4.8864e-03  2.6897  0.007154 ** 
factor(DS_CODE)130324 -5.2497e-16  5.2683e-16 -0.9965  0.319028    
factor(DS_CODE)130409 -4.0276e-16  5.2384e-16 -0.7689  0.441986    
factor(DS_CODE)130775 -4.4113e-16  5.2424e-16 -0.8415  0.400089  
...

This leaves us with the question why it doesn't for you. I guess it has something to do with the format of your data. Is everything numeric? I converted the column classes and it looks like that for me:

str(dat)
'data.frame':   48251 obs. of  12 variables:
 $ DS_CODE      : chr  "902172" "902172" "902172" "902172" ...
 $ DNEW         : num  2e+05 2e+05 2e+05 2e+05 2e+05 ...
 $ MCAP_SEC     : num  78122 71421 81907 80010 82462 ...
 $ NITA         : num  0.135 0.135 0.135 0.135 0.135 ...
 $ CashGen      : num  0.198 0.198 0.198 0.198 0.198 ...
 $ BM           : num  0.1074 0.1108 0.097 0.0968 0.0899 ...
 $ PE           : num  57 55.3 63.1 63.2 68 ...
 $ PEdummy      : num  0 0 0 0 0 0 0 0 0 0 ...
 $ L1.retE1M    : num  -0.72492 0.13177 0.00122 0.07214 -0.07332 ...
 $ Mom          : num  0 0 0 0 0 ...
 $ PD           : num  5.41e-54 1.51e-66 3.16e-80 2.87e-79 4.39e-89 ...
 $ Impact_change: num  0 -10.59 -10.43 0.7 -6.97 ...

What does str(data) return for you?

like image 124
Christoph_J Avatar answered Sep 29 '22 15:09

Christoph_J