I´m trying to replicate in R a cox proportional hazard model estimation from Stata using the following data http://iojournal.org/wp-content/uploads/2015/05/FortnaReplicationData.dta
The command in stata is the following:
stset enddate2009, id(VPFid) fail(warends) origin(time startdate)
stcox HCTrebels o_rebstrength demdum independenceC transformC lnpop lngdppc africa diffreligion warage if keepobs==1, cluster(js_country)
Cox regression -- Breslow method for ties
No. of subjects = 104 Number of obs = 566
No. of failures = 86
Time at risk = 194190
Wald chi2(10) = 56.29
Log pseudolikelihood = -261.94776 Prob > chi2 = 0.0000
(Std. Err. adjusted for 49 clusters in js_countryid)
-------------------------------------------------------------------------------
| Robust
_t | Haz. Ratio Std. Err. z P>|z| [95% Conf. Interval]
--------------+----------------------------------------------------------------
HCTrebels | .4089758 .1299916 -2.81 0.005 .2193542 .7625165
o_rebstrength | 1.157554 .2267867 0.75 0.455 .7884508 1.699447
demdum | .5893352 .2353317 -1.32 0.185 .2694405 1.289027
independenceC | .5348951 .1882826 -1.78 0.075 .268316 1.066328
transformC | .5277051 .1509665 -2.23 0.025 .3012164 .9244938
lnpop | .9374204 .0902072 -0.67 0.502 .7762899 1.131996
lngdppc | .9158258 .1727694 -0.47 0.641 .6327538 1.325534
africa | .5707749 .1671118 -1.92 0.055 .3215508 1.013165
diffreligion | 1.537959 .4472004 1.48 0.139 .869834 2.719275
warage | .9632408 .0290124 -1.24 0.214 .9080233 1.021816
-------------------------------------------------------------------------------
With R, I´m using the following:
data <- read.dta("FortnaReplicationData.dta")
data4 <- subset(data, keepobs==1)
data4$end_date <- data4$`_t`
data4$start_date <- data4$`_t0`
levels(data4$o_rebstrength) <- c(0:4)
data4$o_rebstrength <- as.numeric(levels(data4$o_rebstrength[data4$o_rebstrength])
data4 <- data4[,c("start_date", "end_date","HCTrebels", "o_rebstrength", "demdum", "independenceC", "transformC", "lnpop", "lngdppc", "africa", "diffreligion", "warage", "js_countryid", "warends")]
data4 <- na.omit(data4)
surv <- coxph(Surv(start_date, end_date, warends) ~ HCTrebels+ o_rebstrength +demdum + independenceC+ transformC+ lnpop+ lngdppc+ africa +diffreligion+ warage+cluster(js_countryid), data = data4, robust = TRUE, method="breslow")
coef exp(coef) se(coef) robust se z p
HCTrebels -0.8941 0.4090 0.3694 0.3146 -2.84 0.0045
o_rebstrength 0.1463 1.1576 0.2214 0.1939 0.75 0.4505
demdum -0.5288 0.5893 0.4123 0.3952 -1.34 0.1809
independenceC -0.6257 0.5349 0.3328 0.3484 -1.80 0.0725
transformC -0.6392 0.5277 0.3384 0.2831 -2.26 0.0240
lnpop -0.0646 0.9374 0.1185 0.0952 -0.68 0.4974
lngdppc -0.0879 0.9158 0.2060 0.1867 -0.47 0.6377
africa -0.5608 0.5708 0.3024 0.2898 -1.94 0.0530
diffreligion 0.4305 1.5380 0.3345 0.2878 1.50 0.1347
warage -0.0375 0.9632 0.0405 0.0298 -1.26 0.2090
Likelihood ratio test=30.1 on 10 df, p=0.000827
n= 566, number of events= 86
I get the same hazard ratio coefficients but the standard errors does not look the same. The Z and p values are close but not exactly the same. Why might be the difference between the results in R and Stata?
As user20650 noticed, when including "nohr" in the Stata options you get exactly the same standard errors as in R. Still there was a small difference in the standard errors when using clusters. user20650 again noticed that the difference was given because Stata default standard errors are multiplied g/(g − 1), where g is the number of cluster while R does not adjust these standard errors. So a solution is just to include noadjust in Stata or have the standard errors adjusted in R by doing:
sqrt(diag(vcov(surv))* (49/48))
If still we want in R to have the same standard errors from Stata, as when not specifying nohr, we need to know that when nhr is left off we obtain $exp(\beta)$ with the standard errors resulting from fitting the model in those scale. In particular obtained by applying the delta method to the original standard-error estimate. "The delta method obtains the standard error of a transformed variable by calculating the variance of the corresponding first-order Taylor expansion, which for the transform $exp(\beta)$ amounts to mutiplying the oringal standard error by $exp(\hat{\beta})$. This trick of calculation yields identical rsults as does transforming the parameters prior to estimation and then reestimating" (Cleves et al 2010). In R we can do it by using:
library(msm)
se <-diag(vcov(surv)* (49/48))
sapply(se, function(x) deltamethod(~ exp(x1), coef(surv)[which(se==x)], x))
HCTrebels o_rebstrength demdum independenceC transformC lnpop lngdppc africa diffreligion warage
0.1299916 0.2267867 0.2353317 0.1882826 0.1509665 0.0902072 0.1727694 0.1671118 0.4472004 0.02901243
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With