How does cox.zph deal with time-dependent covariates?

Tags:

I have a coxph model with 5 time-dependent and 2 time-independent variables. I want to test the proportional hazards assumption and besides martingale and deviance residuals, using cox.zph. My question is, how does this function deal with time-dependent covariates?

After reading Grant et al.,2014, I am not sure if this is the recommended goodness-of-fit test to assess the PH assumption for time-varying covariates.

Model:

teste<-coxph(Surv(tempo1,tempo2,status)~sexo+CODE_06+factor(clima)+TP_media7
             +ndvi+peso+epoca,data=newftable,na.action=na.fail)

> cox.zph(teste)
                         rho    chisq      p
sexoM                 0.0844  0.32363 0.5694
CODE_06Regadio        0.1531  0.66865 0.4135
CODE_06Sequeiro       0.2278  1.65735 0.1980
factor(clima)8       -0.1823  1.16522 0.2804
factor(clima)9        0.1051  0.24456 0.6209
factor(clima)15      -0.0193  0.00945 0.9226
TP_media7(12,22]      0.1689  0.75604 0.3846
TP_media7(22,32]      0.1797  1.03731 0.3084
TP_media7(32,41]      0.1060  0.34036 0.5596
ndvi(3e+03,4e+03]    -0.1595  1.00006 0.3173
ndvi(4e+03,5e+03]     0.0421  0.05233 0.8191
ndvi(5e+03,6e+03]     0.1750  0.98816 0.3202
ndvi(6e+03,8.05e+03] -0.0311  0.02880 0.8653
peso[850,1005]        0.2534  3.34964 0.0672
epocamid_inv_rep      0.0193  0.01219 0.9121
epocamid_pos_inv     -0.2193  0.93355 0.3339
epocamid_rep_pos      0.0231  0.01341 0.9078
epocapos_repr         0.2073  1.09893 0.2945
epocarepr             0.0766  0.12905 0.7194
GLOBAL                    NA 19.79229 0.4072

518

asked Jun 30 '14 10:06

JMarcelino

2 Answers

As I understand it cox.zph is a test as to whether a covariate should enter the model as independent of time. If you already know that your predictor is time-dependent then this does not seem to be the appropriate approach. I'm not aware of an easy way to go about this and such a question may find a more receptive audience on Cross Validated.

For a reproducible example, we can use that from Therneau:

library(survival)
veteran$celltype <- relevel(veteran$celltype, ref="adeno")
f1 <- coxph(Surv(time, status) ~
            trt + celltype + karno + diagtime + age + prior,
            data=veteran)
(z1 <- cox.zph(f1, transform="log"))

                       rho   chisq        p
trt               -0.01561  0.0400 0.841486
celltypesquamous  -0.16278  3.8950 0.048431
celltypesmallcell -0.11908  2.2199 0.136238
celltypelarge      0.00942  0.0121 0.912551
karno              0.29329 11.8848 0.000566
diagtime           0.11317  1.6951 0.192930
age                0.20984  6.5917 0.010245
prior             -0.16683  3.9873 0.045844
GLOBAL                  NA 27.5319 0.000572

rho is Pearson's correlation between the scaled Shoenfeld residuals and g(t) where g is a function of time (default is the Kaplan-Meier scale; here we are using log, as you can see on the scale of the x axis in the plot below). If the variable is time-invariant then the slope of the plotted line should be zero. This is essentially what chisq tests.

Update @Didi Ingabire - in light of your comments:

Thus a low p-value indicates:

the Schoenfeld residuals are not constant over time
there is evidence that the variable/predictor may be time-dependent
the proportional-hazards assumption (made when generating the coxph model) may be violated by this variable

You can see this visually like so:

for (i in 1:(nrow(z1$table)-1)){
    plot(z1[i], main="Scaled Schoenfeld residuals by time with smooth spline
If <0 indicates protective effect")
    graphics::abline(a=0, b=0, col="black")
}

which gives e.g.:

enter image description here

Update @JMarcelino This is to say that cox.zph is a test of the final form of the model, to ensure that the residuals are relatively constant over time.

If one of the variables is already a function of time (when it enters the model), this won't affect the test. In fact it should be more likely to produce a flat line with a high p-value if the influence of time is modeled correctly.

Also, testing proportional hazards means testing is the hazard ratio constant over time?. Whether the variable is time-dependent or not (when it enters the model) is unimportant. What is being tested is the final form of the model.

For example, instead of karno we can enter a variable which is related to both it and to time like so:

f2 <- coxph(Surv(time, status) ~
            trt + celltype + log(karno * time) + diagtime + age + prior,
            data=veteran)
(z2 <- cox.zph(f2, transform="log"))

                      rho  chisq     p
trt                0.0947 1.4639 0.226
celltypesquamous  -0.0819 1.1085 0.292
celltypesmallcell -0.0897 1.3229 0.250
celltypelarge      0.0247 0.0968 0.756
log(karno * time) -0.0836 0.6347 0.426
diagtime           0.0463 0.2723 0.602
age                0.0532 0.3493 0.554
prior             -0.0542 0.3802 0.538
GLOBAL                 NA 7.6465 0.469

This gives us a model which better fits the proportional hazards assumption. However the interpretation of the coefficient log(karno * time) is not particularly intuitive and unlikely to be of great practical value.

answered Sep 18 '22 12:09

dardisco

Its important to distinguish between time-dependent variable and a variable that does not meet the PH assumption.

A time-dependent variable is one that vary with time. This could be blood pressure; it will vary on different occasions. Sex (gender) will however not vary on different occasions.

Then there is a distinction between internal and external time-dependent variables:

• Internal time-dependent variables: are variables that vary because of changes within the individual (e.g blood pressure).

• External time-dependent variables: environmental/external changes that modify the hazard experienced by an individual (e.g as industries proliferate in a city, air pollution increases with time and so the hazard in the population increases for conditions such as myocardial infarction).

Regardless of the nature of a variable, fixed or time-dependent, it can violate the PH assumption. I could provide a few examples, but it is probably easier to just accept the fact that any variable might violate PH assumption. It can even if You try to accomodate the variations in an extended Cox model (e.g using multiple observations per individual in counting-process format).

The solution: you can enter any predictor into a Cox model and check if it fulfills the PH by the cox.zph function of Thernau's survival package. The corresponding statement in SAS would be the 'assess ph / resample' statement. If a variable violates PH, the (probably) simplest way to solve this is to introduce an interaction between that variable and your time variable.

Example follows. This is the Cox formula:

Survival = age + sex + blood_pressure

Lets say blood pressure violates PH --> Introduce the following term:

Survival = age + sex + blood_pressure*survival_time_variable

This should solve it but you cannot interpret the main effect of blood-pressure because that variable now depends om time.

Another solution is to stratify your model, but this would not be appropriate for a continuous variable and for categorical variables, once stratified, the variable is not included as a covariate in the resulting model (ie you wont be getting a hazard ratio).

answered Sep 20 '22 12:09

Adam Robinsson

Related questions
                            
                                Dual seasonal cycles in ts object
                            
                                How can I print each text element on a new line after a tab indent?
                            
                                ggplot2: Order in legend alphabetically instead of following the order of appearance in the dataset
                            
                                R code that evaluates line-of-sight (LOS) between two (lat, lon) points
                            
                                Fastest way to do this double summation?
                            
                                How to add abline in ggplot2 with x-axis as year?
                            
                                how to replace legend 'bullet' of geom_text guide (legend)
                            
                                Strange error: formal arguments omitted in the method definition cannot be in the signature
                            
                                choose n most distant points in R
                            
                                Divide every number in every column by 1000 in R
                            
                                Choosing specific lags in ARIMA or VAR Model
                            
                                passing function argument to dplyr select
                            
                                inline function code doesn't compile
                            
                                Sample equidistant points from a numeric vector
                            
                                Are we able to generate a list of loaded packages in R?
                            
                                listing all subsets of a vector for a given size
                            
                                Converting .Rd file to plain text
                            
                                R-plot a centered legend at outer margins of multiple plots
                            
                                ggplot2 - Modify geom_density2d to accept weights as a parameter?
                            
                                merge partial matched strings

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With