Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Cox proportional hazard model in R vs Stata

I´m trying to replicate in R a cox proportional hazard model estimation from Stata using the following data http://iojournal.org/wp-content/uploads/2015/05/FortnaReplicationData.dta

The command in stata is the following:

stset enddate2009, id(VPFid) fail(warends) origin(time startdate)
stcox HCTrebels o_rebstrength demdum independenceC transformC lnpop lngdppc africa diffreligion warage if keepobs==1, cluster(js_country)

Cox regression -- Breslow method for ties

No. of subjects      =          104                Number of obs   =       566
No. of failures      =           86
Time at risk         =       194190
                                               Wald chi2(10)   =     56.29
Log pseudolikelihood =   -261.94776                Prob > chi2     =    0.0000

                           (Std. Err. adjusted for 49 clusters in js_countryid)
-------------------------------------------------------------------------------
              |               Robust
           _t | Haz. Ratio   Std. Err.      z    P>|z|     [95% Conf. Interval]
--------------+----------------------------------------------------------------
    HCTrebels |   .4089758   .1299916    -2.81   0.005     .2193542    .7625165
o_rebstrength |   1.157554   .2267867     0.75   0.455     .7884508    1.699447
       demdum |   .5893352   .2353317    -1.32   0.185     .2694405    1.289027
independenceC |   .5348951   .1882826    -1.78   0.075      .268316    1.066328
   transformC |   .5277051   .1509665    -2.23   0.025     .3012164    .9244938
        lnpop |   .9374204   .0902072    -0.67   0.502     .7762899    1.131996
      lngdppc |   .9158258   .1727694    -0.47   0.641     .6327538    1.325534
       africa |   .5707749   .1671118    -1.92   0.055     .3215508    1.013165
 diffreligion |   1.537959   .4472004     1.48   0.139      .869834    2.719275
       warage |   .9632408   .0290124    -1.24   0.214     .9080233    1.021816
-------------------------------------------------------------------------------

With R, I´m using the following:

data <- read.dta("FortnaReplicationData.dta")
data4 <- subset(data, keepobs==1)
data4$end_date <- data4$`_t`
data4$start_date <- data4$`_t0`
levels(data4$o_rebstrength) <- c(0:4)
data4$o_rebstrength <- as.numeric(levels(data4$o_rebstrength[data4$o_rebstrength])
data4 <- data4[,c("start_date", "end_date","HCTrebels",  "o_rebstrength", "demdum", "independenceC", "transformC", "lnpop", "lngdppc", "africa", "diffreligion", "warage", "js_countryid", "warends")]
data4 <- na.omit(data4)
surv <- coxph(Surv(start_date, end_date, warends) ~ HCTrebels+ o_rebstrength +demdum + independenceC+ transformC+ lnpop+ lngdppc+ africa +diffreligion+ warage+cluster(js_countryid), data = data4, robust = TRUE, method="breslow")

                 coef exp(coef) se(coef) robust se     z      p
HCTrebels     -0.8941    0.4090   0.3694    0.3146 -2.84 0.0045
o_rebstrength  0.1463    1.1576   0.2214    0.1939  0.75 0.4505
demdum        -0.5288    0.5893   0.4123    0.3952 -1.34 0.1809
independenceC -0.6257    0.5349   0.3328    0.3484 -1.80 0.0725
transformC    -0.6392    0.5277   0.3384    0.2831 -2.26 0.0240
lnpop         -0.0646    0.9374   0.1185    0.0952 -0.68 0.4974
lngdppc       -0.0879    0.9158   0.2060    0.1867 -0.47 0.6377
africa        -0.5608    0.5708   0.3024    0.2898 -1.94 0.0530
diffreligion   0.4305    1.5380   0.3345    0.2878  1.50 0.1347
warage        -0.0375    0.9632   0.0405    0.0298 -1.26 0.2090

Likelihood ratio test=30.1  on 10 df, p=0.000827
n= 566, number of events= 86 

I get the same hazard ratio coefficients but the standard errors does not look the same. The Z and p values are close but not exactly the same. Why might be the difference between the results in R and Stata?

like image 321
user2246905 Avatar asked Sep 27 '22 02:09

user2246905


1 Answers

As user20650 noticed, when including "nohr" in the Stata options you get exactly the same standard errors as in R. Still there was a small difference in the standard errors when using clusters. user20650 again noticed that the difference was given because Stata default standard errors are multiplied g/(g − 1), where g is the number of cluster while R does not adjust these standard errors. So a solution is just to include noadjust in Stata or have the standard errors adjusted in R by doing:

sqrt(diag(vcov(surv))* (49/48))

If still we want in R to have the same standard errors from Stata, as when not specifying nohr, we need to know that when nhr is left off we obtain $exp(\beta)$ with the standard errors resulting from fitting the model in those scale. In particular obtained by applying the delta method to the original standard-error estimate. "The delta method obtains the standard error of a transformed variable by calculating the variance of the corresponding first-order Taylor expansion, which for the transform $exp(\beta)$ amounts to mutiplying the oringal standard error by $exp(\hat{\beta})$. This trick of calculation yields identical rsults as does transforming the parameters prior to estimation and then reestimating" (Cleves et al 2010). In R we can do it by using:

library(msm)
se <-diag(vcov(surv)* (49/48))
sapply(se, function(x) deltamethod(~ exp(x1), coef(surv)[which(se==x)], x))

     HCTrebels o_rebstrength    demdum independenceC transformC     lnpop   lngdppc    africa diffreligion     warage
     0.1299916     0.2267867 0.2353317     0.1882826  0.1509665 0.0902072 0.1727694 0.1671118    0.4472004 0.02901243
like image 97
user2246905 Avatar answered Oct 11 '22 07:10

user2246905